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METHOD FOR SELECTING MEDICAL AND BIOCHEMICAL DIAGNOSTIC 
TESTS USING NEURaL NETWORK-RELATED APPLICATIONS 

This application is a continuatidn of allowed U.S. application Serial 

No. 08/912,133, filed AuguJt 14, 1997, to Jeronne Lapointe and Duane 

DeSieno entitled "METHOD 'fOR SELECTING MEDICAL AND 

BIOCHEMICAL DIAGNOSTIClTESTS USING NEURAL NETWORK-RELATED 



APPLICATIONS." This application is 



a continuation-in-part of 



U.S. application Serial 08/798,306, filed February 7, 1997 entitled 



"METHOD FOR SELECTING MEDICAL 



AND BIOCHEMICAL DIAGNOSTIC 



TESTS USING NEURAL NETVVORK-RELATED APPLICATIONS" to 



Jerome Lapointe and Duane DeSieno 



continuation-in-part of U.S. a'pplication Serial 08/599,275, filed 



February 9, 1 996, to Jeronne! Lapointe 



"METHOD FOR DEVELOPINQ MEDICAL AND BIOCHEMICAL 



DIAGNOSTIC TESTS USING IjJEURAL 
application Serial No. 08/798,306 is ; 
U.S. application Serial No. 08/599,27 



This application is also a 



and Duane DeSieno, entitled 



NETWORKS." U.S. 
continuation-in-part of 
I. This application and 



U.S. application Serial No. 08/798,306 claim the benefit of priority under 
35 U.S.C. §1 19(e) to U.S. provisional application Serial No. 60/011,449, 



entitled "METHOD AND APPARATUS 



OF ENDOMETRIOSIS USING A PLURALITY OF PARAMETERS SUITED 
FOR ANALYSIS THROUGH a'nEURAL NETWORK" to Jerome Lapointe 



and Duane DeSieno, filed Feb'ruary 9 



FOR AIDING IN THE DIAGNOSIS 



1996. 



The subject matter of each of the above-noted applications and 

I 

provisional application is herein incorporated in its entirety by reference 
thereto. ! 
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APPENDICES I 

i 

Three computer Appendices containing computer program source 

! 

code for programs described herein have been submitted concurrently 

with the filing of this application. The Computer Appendices were 

i 

converted to Computer Program Listing Compact Disk Appendices 
pursuant to 37 C.F.R. 1.96(b). Appendices I, II, and III are on compact 
disks, copy 1 and copy 2, and stored under the file name Appenixl-lll.txt, 
392KB, created on 01/10/02. The compact disks, copy 1 and copy 2, 
are identical. The information submitted on the Compact Disk is in 
compliance with the American Standard Code for Information Interchange 
(ASCII) in the IBM-PC mach'ine format compatible with the MS-Windows 
operating system. The Conriputer Appendices, which are referred to 
hereafter as the "Compact Disk Appendices", are each incorporated 
herein by reference in its erjitirety. 

Thus, a portion of the disclosure of this patent document contains 
material that is subject to copyright protection. The copyright owner has 
no objection to the facsimile reproduction by anyone of the patent 
document or patent disclosure, as it appears in the Patent and Trademark 
Office patent file or records, but otherwise reserves all copyright rights 
whatsoever. j 
FIELD OF THE INVENTION| 

This subject matter of the invention relates to the use of prediction 
technology, particularly no|nlinear prediction technology, for the 
development of medical diagnostic aids. In particular, training techniques 

operative on neural networks and other expert systems with inputs from 

I 
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patient historical information| for the development of medical diagnostic 
tools and methods of diagnojsis are provided. 
BACKGROUND OF THE INVENTION 

Data Mining, decisioni support-systems and neural networks 

A number of computer decision-support systems have the ability to 
classify information and identify patterns in input data, and are 
particularly useful in evaluating data sets having large quantities of 
variables and complex interactions between variables. These computer 

decision systems which are' collectively identified as "data mining" or 

I 

"knowledge discovery in databases" (and herein as decision-support 
systems) rely on similar basic hardware components, e.g. , personal 
computers (PCS) with a processor, internal and peripheral devices, 
memory devices and inputjoutput interfaces. The distinctions between 
the systems arise within the software, and more fundamentally, the 
paradigms upon which thej software is based. Paradigms that provide 
decision-support functionsj include regression methods, decision trees, 
discriminant analysis, pattern recognition, Bayesian decision theory, and 
fuzzy logic. One of the mpre widely used decision-support computer 
systems is the artificial neural network. 

Artificial neural networks or "neural nets" are parallel information 
processing tools in which jindividual processing elements called neurons 

are arrayed in layers and furnished with a large number of interconnec- 

( 

tions between elements irji successive layers. The functioning of the 

processing elements are rriodeied to approximate biologic neurons where 

i 

the output of the process^ing element is determined by a typically non- 
linear transfer function. In a typical model for neural networks, the 
processing elements are arranged into an input layer for elements which 
receive inputs, an output! layer containing one or more elements which 
generate an output, and one or more hidden layers of elements 
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therebetween. The hidden l^iyers provide the means by which non-linear 
problems may be solved. Wjithin a processing element, the input signals 
to the element are weighted arithmetically according to a weight 
coefficient associated with each input. The resulting weighted sum is 
5 transformed by a selected njDn-linear transfer function, such as a sigmoid 
function, to produce an output, whose values range from 0 to 1 , for each 
processing element. The learning process, called "training", is a trial-and- 
error process involving a series of iterative adjustments to the processing 
element weights so that a particular processing element provides an 
10 output which, when combiried with the outputs of other processing 

elements, generates a result which minimizes the resulting error between 



the outputs of the neural network and the desired outputs as represented 
;h in the training data. Adjustment of the element weights are triggered by 



error signals. Training data are described as a number of training 
15 examples in which each example contains a set of input values to be 

presented to the neural network and an associated set of desired output 



,y values. 



A common training rjnethod is backpropagation or "backprop", in 
which error signals are propagated backwards through the network. The 
20 error signal is used to det^'rmine how much any given element's weight is 
to be changed and the error gradient, with the goal being to converge to a 

global minimum of the mean squared error. The path toward- 

i 

convergence, i.e. . the gradient descent, is taken in steps, each step being 
an adjustment of the input weights of the processing element. The size 
25 of each step is determined by the learning rate. The slope of the gradient 
descent includes flat and steep regions with valleys that act as local 
minima, giving the false impression that convergence has been achieved, 
leading to an inaccurate result. 
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Some variants of backprop incorporate a nnomentum term in which 
a proportion of the previous weight-change value is added to the current 
value. This adds momentum! to the algorithm's trajectory in its gradient 
descent, which may prevent lit from becoming "trapped" in local minima. 
One backpropogation method which includes a momentum term is 
"Quickprop", in which the momentum rates are adaptive. The Quickprop 
variation is described by Fahlman (see/'Fast Learning Variations on Back- 
Propagation: An Empirical Study", Proceedings on the 1988 Connectionist 
Models Summer School , Pittsburgh, 1988, D. Touretzky, et aL , eds., 

i 

pp. 38-51, Morgan Kaufmanh, San Mateo, CA; and, with Lebriere, "The 
Cascade-Correlation Learning Architecture", Advances in Neural 
Information Processing Svstems 2 , (Denver, 1989), D. Touretzky, ed., pp. 
524-32. Morgan Kaufmann, jsan Mateo, CA). The Quickprop algorithm is 
publicly accessible, and mayl be downloaded via the Internet, from the 
Artificial Intelligence Repository maintained by the School of Computer 
Science at Carnegie Mellon University. In Quickprop, a dynamic 
momentum rate is calculated based upon the slope of the gradient. If the 
slope is smaller but has the same sign as the slope following the 
immediately preceding weiglfit adjustment, the weight change will 
accelerate. The acceleration rate is determined by the magnitude of 
successive differences betwjeen slope values. If the current slope is in 
the opposite direction from the previous slope, the weight change 
decelerates. The Quickprop} method improves convergence speed, giving 
the steepest possible gradient descent, helping to prevent convergence to 
a local minimum. j 

When neural networks are trained on sufficient training data, the 
neural network acts as an associative memory that is able to generalize to 
a correct solution for sets of new input data that were not part of the 
training data. Neural networks have been shown to be able to operate 
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even in the absence of complete data or in the presence of noise. It has 

j 

also been observed that the performance of the network on new or test 
data tends to be lower than the performance on training data. The 
difference in the performance on test data indicates the extent to which 
5 the network was able to generalize from the training data. ' A neural 
network, however, can be retrained and thus learn from the new data, 
improving the overall perfornnance of the network. 

Neural nets, thus, hav6 characteristics that make them well suited 
for a large number of different problems, including areas involving 
10 prediction, such as medical diagnosis. 

Neural Nets and Diagnjosis 

In diagnosing and/or trjeating a patient, a physician will use patient 
condition, symptoms, and the results of applicable medical diagnostic 
tests to identify the disease state or condition of the patient. The 

15 physician must carefully determine the relevance of the symptoms and 

I 

test results to the particular diagnosis and use judgement based on 

experience and intuition in mjaking a particular diagnosis. Medical 

i 

diagnosis involves integratiorii of information from several sources 

I 

including a medical history, a physical exam and biochemical tests. 
20 Based upon the results of the exam and tests and answers to the 
questions, the physician, using his or her training, experience and 
knowledge and expertise, formulates a diagnosis. A final diagnosis may 
require subsequent surgical procedures to verify or to formulate. Thus, 
the process of diagnosis involves a combination of decision-support, 
25 intuition and experience. The validity of a physician's diagnosis is very 
dependent upon his/her experience and ability. 

Because of the predictive adld intuitive nature of medical diagnosis. 




tempts have been made to 



systems that aid in this process 



deve op neural networks and other expert 



he application of neural networks to 
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medical diagnosis has been reported. For example, neural networks have 
been used to aid in the diagnosis of cardiovascular disorders (see, e.g. . 



Baxt (1991) "Use of an Artificial N 



(jural Network for the Diagnosis of 



20 



"Improving the Accuracy of jan Art 
Differently Trained Networks/' Ned 



Myocardial Infarction/' Annals of Internal Medicine 115 :843; Baxt (1992) 



ficial Neural Network Using Multiple 
ral Computation 4:772; Baxt (1992), 



25 



"Analysis of the' clinical variables flhat drive decision in an artificial neural 
network trained to identify the presence of myocardial infarction/' Annals 
of Emergency Medicine 21 :j439;/and Baxt (1994) "Complexity, chaos 
and human physiology: the justification for non-linear neural 
computational analysis," Cancer Letters 77 :85). Other medical diagnostic 
applications include the usejof neural networks for cancer diagnosis (see, 
e.g. , Maclin, et aJL (19910 "lUsirjg Neural Networks to Diagnose Cancer" 
Journal of Medical Svstems ! 15 :11-9; Rogers, et aL (1994) "Artificial 
Neural Networks for Early Djetec tion and Diagnosis of Cancer" Cancer 
Letters 77:79-83; Wilding, et aL (1994) "Application of Backpropogation 
Neural Networks to Diagnosis df Breast and Ovarian Cancer" Cancer 
Letters 77 :145-53), neuronrjuscjular disorders (Pattichis, et aL (1995) 
"Neural Network Models injEMG Diagnosis", IEEE Transactions on 
Biomedical Engineering 42:5:4 J6-495), and chronic fatigue syndrome 
(Solms, et aL (1996) "A Neur al Network Diagnostic Tool for the Chronic 
Fatigue Syndrome", International Conference on Neural Networks, Paper 
No. 108). These methodolpgi(JS, however, fail to address significant 
issues relating to the development of practical diagnostic tests for a wide 
range of conditions and does n^ot address the selection of input variables. 

Computerized decision-support methods other than neural networks 
have been reported for their applications in medical diagnostics, including 
knowledge-based expert systems, including MYCIN (Davis, et aL , 
"Production Systems as a Representation for a Knowledge-based 
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Consultation Program", Artificial Intelligence . 1977; 8: 1: 15-45) and its 

I 

progeny TEIRESIAS, EMYCIM, PUFF, CENTAUR, VM, GUIDON, SACON, 
ONCOCIN and ROGET. MYCIN is an interactive program that diagnoses 
certain infectious diseases and prescribes anti-microbial therapy. Such 
5 knowledge-based systems contain factual knowledge and rules or other 

methods for using that knowledge, with all of the information and rules 

! 

being pre-programmed into the system's memory rather than the system 
developing its own procedure for reaching the desired result based upon 
input data, as in neural networks. Another computerized diagnosis 
'3 10 method is the Bayesian network, also known as a belief or causal 

P probabilistic network, whicHi classifies patterns based on probability 

M density functions from training patterns and a priori information. Bayesian 

'fi decision systems are reported for uses in interpretation of mammograms 

\^ for diagnosing breast cancer (Roberts, et aL , "MammoNet: A Bayesian 

15 Network diagnosing BreastjCancer", Midwest Artificial Intelligence and 
Cognitive Science Society Conference, Carbondale, IL, April 1995) and 
hypertension (Blinowska, et aL (1993) "Diagnostica -- A Bayesian 
Decision-Aid System - Applied to Hypertension Diagnosis", IEEE 
Transactions on Biomedical Engineering 40:230-35) Bayesian decision 

i 

20 systems are somewhat limited in their reliance on linear relationships and 
in the number of input data points that can be handled, and may not be 
as well suited for decisionjsupport involving non-linear relationships 
between variables. Impler^ientation of Bayesian methods using the 
processing elements of a neural network can overcome some of these 
25 limitations (see, e.g. . Penniy, et aL (1996) In "Neural Networks in CMnical 
Medicine", Medical Decision-support . 1996; 16:4: 386-98). These 
methods have been used,! by mimicking the physician, to diagnose 
disorders in which important variables are input into the system. It, 
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however, would be of interest to use these systems to improve upon 
existing diagnostic procedures. 
Endometriosis | 

Endometriosis is the growth of uterine-like tissue outside of the 
5 uterus. It affects about 15-30 percent of reproductive age women. The 
cause(s) of endometriosis are not known, but it may result from 
retrograde menstruation, the reflux of endometrial tissue and cells 
(menstrual debris) from the Uterus into the peritoneal cavity. While 
retrograde menstruation is thought to occur in most or all women, it is 
10 unclear why some women develop endometriosis and others do not. 

Not all women with endometriosis exhibit symptoms or suffer from 



/J the disease. The extent or severity of endometriosis does not correlate 

If^ with symptoms. Some wornen with severe disease may be completely 

1^ asymptomatic, whereas othjers with minimal disease may suffer from 

•'^ 15 excruciating pain. Symptorlns, such as infertility, pelvic pain, 

dysmenorrhea and past occurrence of endometriosis, that have been 
associated with endometrio'sis often occur in women who do not have 



endometriosis. In other insjtances, these symptoms may be present, and 
the women do have endomjetriosis. Although an association between 

20 these symptoms and endoririetriosis appears to exist, the correlation is far 
from perfect, the interplay jwith these and other factors are complex. 
Clinicians often perform diagnostic laparoscopies on patients whom they 
believe are excellent candidates for having endometriosis based a 
combination of the above indications. Endometriosis, however, is not 

25 present in a significant proportion of these women. Thus, endometriosis 
represents an example of ^ disease state in which a physician must draw 
upon experience using a cbmplex set of information to formulate a 
diagnosis. The validity of jthe diagnosis is related to the experience and 
ability of the physician. I 



i!3 
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As a result, determining if a woman has endometriosis from 
symptoms alone has not been possible. Within the medical community, 
the diagnosis of endometriosis is confirmed only by direct visualization of 
endometrial lesions during surgery. Many physicians often impose a 
5 further restriction and demarjid that the suspected lesions be verified as 
being endometrial-like (gland's and stroma) using histology on endometrial 
biopsied tissue. Thus, a non-invasive diagnostic test for endometriosis 
would be of significant benej^it. 

Therefore, it is an object herein to provide a non-invasive diagnostic 
y 10 aid for endometriosis. It is also an object herein to provide methods to 

rP select important variables to, be used in decision-support systems to aid in 

\d \ 

W diagnosis of endometriosis and other disorders and conditions. It is also 

Ifl I 

ru an object herein to identify new variables, identify new biochemical tests 



and markers for diseases an|d to design to new diagnostic tests that 

I 



15 improve upon existing diagnostic methodologies, 
j.^ SUMMARY OF THE INVENTIION 

Methods using decision-support systems for the diagnosis of and 
for aiding in the diagnosis o|f diseases, disorders and other medical 
conditions are provided. The methods provided herein, include a method 
20 of using patient history data and identification of important variables to 
develop a diagnostic test; al method for identification of important 
selected variables; a methojd of designing a diagnostic test; a method of 
evaluating the usefulness of diagnostic test; a method of expanding 
clinical utility of diagnostic [test, and a method of selecting a course of 
25 treatment by predicting thej outcome of various possible treatments. Also 
provided are disease paranneters or variables to aid in the diagnosis of 
disorders, including any dis^orders that are difficult to diagnose, such as 
endometriosis, predicting p'regnancy related events, such as the likelihood 
of delivery within a particu ar time period, and other such disorders 
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relevant to women's health, ; It is understood that although women's 
disorders are exemplified herein, the methods herein are applicable to any 
disorder or condition. 

Also provided are means to use neural network training to guide 
5 the development of the tests! to improve their sensitivity and specificity, 
and to select diagnostic test^ that improve overall diagnosis of, or 
potential for, a disease state or medical condition. Finally, a method for 
evaluating the effectiveness of any given diagnostic test is described. 

Thus, provided herein |is a method for identifying variables or sets 
10 of variables that aid in the diagnosis of disorders or conditions. In the 

methods for identifying and selection of important variables and 

i 

generating systems for diagnosis, patient data or information, typically 
patient history or clinical data are collected and variables based on this 
data are identified. For example, the data includes information for each 

15 patient regarding the numbetj of pregnancies each patient has had. The 
extracted variable is, thus, number of pregnancies. The variables are 
analyzed by the decision-support systems, exemplified by neural 
networks, to identify important or relevant variables. 

Methods are provided for developing medical diagnostic tests using 

20 computer-based decision-support systems, such as neural networks and 
other adaptive processing systems (collectively, "data mining tools"). 
The neural networks or other such systems are trained on the patient data 
and observations collected from a group of test patients in whom the 
condition is known or suspected; a subset or subsets of relevant variables 

25 are identified through'the usie of a decision-support system or systems, 

such as a neural network orja consensus of neural networks; and another 
set of decision-support systems is trained on the identified subset(s) to 
produce a consensus decision-support system based test, such as a 
neural net-based test for the, condition. The use of consensus systems, 
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such as consensus neural nejtworks, nninimizes the negative effects of 
local minima in decision-suppport systems, such as neural network-based 
systems, thereby improving the accuracy of the system. 

Also, to refine or improve performance, the patient data can be 
augmented by increasing the number of patients used. Also biochemical 
test data and other data may be included as part of additional examples or 
by using the data as additiorial variables prior to the variable selection 
\process. 

The resulting systems are use^d as an aid in diagnosis. In addition, 
the systems are used patient datn can be stored and then used to 
further train the systems and to devjslop systems that are adapted for a 

This 




particular genetic population 
system may be implemented 
the systems continually learr^ and a 
15 which they are used. The resulting 



addition to diagnosis, which 



includ 



inputting of additional data into the 
autom Jtically or done manually. By doing so 
Japt to the particular environment in 

systems have numerous uses in 
s assessing the severity of a disease 
of a selected treatment protocol. 



or disorder, predicting the outcome 

The systems may also be used to cissess the value of other data in a 
diagnostic procedure, such as biochemical test data and other such data, 
20 and to identify new tests that are jseful for diagnosing a particular 
disease. I ; 

Thus, also provided are methods for improving upon existing 
biochemical tests, identifying relevant biochemical tests and for 
developing new biochemical tests to aid in diagnosis of disorders and 
conditions. These methods involve assessing the effect of a particular 
test or a potential new test on the performance of the decision-support 
system based test. If addition of information from the test improves 
performance, such test will have relevance in diagnosis. 



25 
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The disorders and con|ditions that are of particular interest herein 
and to which the methods herein may be readily applied, are gyneco- 
logical conditions and other conditions that impact on fertility, including 

i 

but not limited to endometriosis, infertility, prediction of pregnancy-related 
events, such as the likelihood of delivery within a particular time period, 
and pre-eclampsia. It is understood, however, that the methods herein 
applicable to any disorde^r or condition. 

The methods are exemplified vjith reference to neural networks, 

Jata mining tools, such as expert 
and other statistical decision-support 




however, it is understood that other 
systems, fuzzy logic, decisio'n trees. 



systems which are generally, non-line ar, may be used. Although the 
variables provided herein are intende d to be used with decision-support 



systems, once the variables 



iare ider 



physician, armed with knowledge the 



tified, then a person, typically a 
important variables can use them to 
ecision-support system or using a 



aid in diagnosis in the absenpe of a 
less complex linear system of analy 

As shown herein, variables or combinations thereof that heretofore 
were not known to be impoijtant in aiding in diagnosis are identified. In 
addition, patient history data, without supplementing biochemical test 
data, can be used to diagnose or aid in diagnosing a disorder or condition 
when used with the decisiori-support systems, such as the neural nets 
provided herein. Furthermoife, the accuracy of the diagnosis with or 
without biochemical data may be sufficient to obviate the need for 
invasive surgical diagnostic procedures. 

Also provided herein \s a method of identifying and expanding 
clinical utility of diagnostic t|est. The results of a particular test, particular 
one that had heretofore not 'been considered of clinical utility with respect 
to the disorder or condition df interest, are combined with the variables 
and used with the decision-support system, such as a neural net. If the 
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performance, the ability to cprrectly diagnose a disorder, of the system is 
improved by addition of the 'results of the test, then the test will have 
clinical utility or a new utility. 

Similarly, the resulting systems can be used to identify new utilities 
for drugs or therapies and also to identify uses for particular drugs and 
therapies. For example, thej systems can be used to select 

subpopulations of patients for whom a particular drug or therapy is 

I 

effective. Thus, methods for expanding the indication for a drug or 
therapy and identifying neW drugs and therapies are provided. 

In specific embodimeijits, neural networks are employed to evaluate 
specific observation values and test results, to guide the development of 
biochemical or other diagnostic tests, and to provide the decision-support 
functionality for the test. ; 

A method for identification of important variables (parameters) or 

sets thereof for use in the decision-support systems is also provided. 

I 

This method, while exemplified herein with reference to medical 
diagnosis, has broad applicability in any field, such as financial analysis, 

in which important parametjers or variables are selected from among a 

I 

plurality. j 

In particular, a method for selecting effective combinations of 
variables is provided. The rjnethod involves: (1) providing a set of "n" 
candidate variables and a set of "selected important variables", which 
initially is empty; (2) ranking all candidate variables based on a chi square 
and sensitivity analysis; {3)j taking the highest "m" ranked variables one at 
a time, where m is from 1 up to n, and evaluating each by training a 
consensus of neural nets on that variable combined with the current set 
of important variables; (4) selecting the best of the m variables, where the 
best variable is the one tha|t gives the highest performance, and if it 
improves performance in comparison to the performance of the selected 
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important variables, adding it to the "selected important variable" set, 
removing it from the candidaite set and continuing processing at step (3), 
otherwise going to step (5); (5) if all variables on the candidate set have 
been evaluated, the process is complete, otherwise continue taking the 
next highest "m" ranked variables one at a time, and evaluating each by 
training a consensus of neural nets on that variable combined with the 
current set of important selected variables and performing step (4). The 
final set of important selected variables will contain a plurality, typically 
more than three to five or more variables. 

In a particular embodiment, the sensitivity analysis involves: 
(k) determining an average o'bservation value for each of the variables in 
an observation data set; (I) selecting a training example, and running the 
example through a decision-support system to produce an output value, 
designated and stored as the normal output; (m) selecting a first variable 
in the selected training example, replacing the observation value with the 

average observation value of the first variable; running the modified 

i 

example in the decision-support system in the forward mode and 
recording the output as the modified output; (n) squaring the difference 
between the normal output and the modified output and accumulating it 
as a total for each variable, jn which this total is designed the selected 
variable total for each varialjle; (o) repeat steps (m) and (n) for each 
variable in the example; (p) repeating steps (l)-(n) for each example in the 
data set, where each total for the selected variable represents the relative 
contribution of each variable to the determination of the decision-support 
system output. This total \A/ill be used to rank each variable according to 
its relative contribution to the determination of the decision-support 
system output. ! 

As shown herein, cornputer-based decision-support systems such 
as neural networks reveal that certain input factors, which were not 
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initially considered to be impbrtant, can influence an outconrie. This 
ability of a neural network to reveal the relevant input factors permits its 

i 

use in guiding the design of (diagnostic tests. Thus a method of designing 
a diagnostic test, and a method of evaluating utility of diagnostic test are 
also provided. In each instance, the data from the test or possible test is 
added to the input of the decision-support system. If the results are 
improved when the data are j included in the input, then the diagnostic test 
may have clinical utility. In this manner, tests that heretofore were not 
known to be of value in diagnosis of a particular disorder are identified, or 
new tests can be developed! Neural networks can add robustness to 
diagnostic tests by discounting the effects of spurious data points and by 
identifying other data pointsjthat might be substituted, if any. 

Networks are trained bn one set of variables and then clinical data 
from diagnostic or biochemical test data and/or additional patient 

i 

information are added to the input data. Any variable that improves the 
results compared to their abjsence is (are) selected. As a result, particular 
tests that heretofore were ojf unknown value in diagnosing a particular 
disorder can be shown to have relevance. For example, the presence or 
absence of particular spots on a western blot of serum antibodies can be 
correlated with a disease state. Based on the identity of particular spots 
( i.e. , antigens) new diagnostic tests can be developed. 

An example of the apjplication of the prediction technology to aid in 

I 

the diagnosis of disease and more particularly the use of neural network 
techniques with inputs fronn various information sources to aid in the 
diagnosis of the disease endometriosis is provided. A trained set of neural 
networks operative accordirpg to a consensus of networks in a computer 
system is employed to evaluate specific clinical associations, for example 
obtained by survey, some of which may not generally be associated with 
a disease condition. This is demonstrated with an exemplary disease 
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condition endometriosis, and| factors used to aid in the diagnosis of 

I 

endometriosis are provided. |The neural network training is based on 
correlations between answers to questions furnished by physicians of a 
significant number of clinicalj patients whose disease condition has been 
surgically verified, herein termed clinical data. 

A plurality of factors, ^^twelve to about sixteen, particularly a set of 
fourteen factors, in a specific trained neural, network extracted from a 
collection of over forty clinical data factors have been identified as 
primary indicia for endometriosis. The following set of parameters: age, 
parity (number of births), gravidity (number of pregnancies), number 
abortions, smoking (packs/djay), past history of endometriosis, 
dysmenorrhea, pelvic pain, abnormal pap/dysplasia, history pelvic surgery, 
medication history, pregnancy hypertension, genital warts and diabetes 
were identified as being significant. Other similar sets of parameters were 
also identified. Subsets of these variables also may be employed in 
diagnosing endometriosis. ! 

In particular, any subset of the selected set of parameters, 

I 

particularly the set of fourtejen variables, that contain one (or more) of the 
following combinations of tijiree variables can be used with a decision- 
support system for diagnosi|s of endometriosis: 

a) number of births, history of endometriosis, history of pelvic 
surgery; ! 

b) diabetes, pregnancy hypertension, smoking; 

c) pregnancy hyp|ertension, abnormal pap smear/dysplasia, 
history of endometriosis; ; 

d) age, smoking, history of endometriosis; 

e) smoking, history of endometriosis, dysmenorrhea; 

f) age, diabetes, Ihistory of endometriosis; 
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g) pregnancy hypertension, number of births, history of 
endometriosis; j 

h) Smoking, numbjer of births, history of endometriosis; 

i) pregnancy hypertension, history endometriosis, history of 
pelvic surgery; I 

j) number of pregnancies, history of endometriosis, history of 
pelvic surgery; | 

k) number of births, abnormal PAP smear/dysplasia, history of 

i 

endometriosis; j 

I) number of birth's, abnormal PAP smear/dysplasia. 



dysmenorrhea; 



I 



m) history of endometriosis, history of pelvic surgery, 
dysmenorrhea; and ! 

n) number of pregnancies, history of endometriosis, 
dysmenorrhea. | 

Diagnostic software anci exemplary neural networks that use the 




ariables for diagnosis of end 



ometriosis and the risk of delivery before a 



specified time are also provide d. Software generates a clinically useful 
endometriosis index is provided as software that generates an index for 
assessing the risk. are provide^. 

In other embodiments, the performance of a diagnostic neural 
network system used to test for endometriosis is enhanced by including 
variables based on biochemjical test results from a relevant biochemical 
test as part of the factors (herein termed biochemical test data, which 
includes tests from analyses and data such as vital signs, such as pulse 

rate and blood pressure) us;ed for training the network. An exemplary 

I 

network that results therefrom is an augmented neural network that 
employs fifteen input factors, including results of the biochemical test and 
the fourteen clinical parameters. The set of weights of the eight 
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augmented neural networks Idiffer from the set of weights of the eight 
clinical data neural networks. The exemplified biochemical test employs 

i 

an immuno-diagnostic test format, such as the ELISA diagnostic test 
format. j 

The methodology applied to endometriosis as exemplified herein 
can be similarly applied and! used to identify factors for other disorders, 
including, but not limited toj gynecological disorders and female- 
associated disorders, such as, for example, infertility, prediction of 
pregnancy related events, such as the likelihood of delivery within a 
particular time period, and pre-eclampsia. Neural networks, thus, can be 

trained to predict the disease state based on the identification of factors 

i 

important in predicting the ^disease state and combining them with 



biochemical data. , 

The resulting diagnostic systems may be adapted and used not 
only for diagnosing the pre'sence of a condition or disorder, but also the 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a flow chart for developing a patient-history-based 

j 

diagnostic test process; i 

FIGURE 2 is a flow chart for developing a biochemical diagnostic 
test; j 

FIGURES 3A-B provide a flow chart of the process for isolating 
important variables; | 

FIGURE 4 is a flow bhart on the process of training one or a set of 
neural networks involving ia partitioning of variables; 

FIGURE 5 is a flow chart for developing a biochemical diagnostic 



FIGURE 6 is a flowjchart for determining the effectiveness of a 
biochemical diagnostic test; 



severity of the disorder and as an aid in selecting a course of treatment. 



test; 
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FIGURE 7 is a schematic diagram of a neural network trained on 
clinical data of the form used for the consensus network of a plurality of 
neural networks; j 

FIGURE 8 is a schematic diagram of a second embodiment of a 

5 neural network trained on cjinical data augmented by test results data of 

i 

the form used for the consensus of eight neural networks; 

i 

FIGURE 9 is a schematic diagram of a processing element at each 
node of the neural network'; 

FIGURE 10 is a schematic diagram of a consensus network of eight 
10 neural networks using either the first or second embodiment of the neural 



network; 



FIGURE 1 1 is a depiction of an exemplary interface screen of the 

I 

user interface in the diagnostic endometriosis index; 

FIGURE 12 depicts an exemplary screen showing main menu, tool 
15 bar and results display in ijhe user interface using the software 
(Appendix III) for assessing preterm delivery; 

FIGURE 13 depicts an exemplary Edit Record dialog box in preterm 
delivery software; 



an exemplary Go To dialog box in the software; 
an exemplary Help About dialog box in the 



FIGURE 14 depicts 

20 FIGURE 15 depicts 

I 

^Xsoftware; I 

QJ^I FIGURES 16A and |16B shows exemplary outputs from the 
software, FIGURE 16B include ; the input data as well; 

FIGURE 1 7 is a schematic diagram of a neural network (EGAS) 

25 trained on clinical data of the form used for the consensus network of a 

I 

plurality of neural networks; and 

FIGURE 18 is a schematic diagram of a neural network, such as 
EGAD7f and EGAD14f, l|rained on clinical data of the form used for the 
consensus network of ajplurality of neural networks. 
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DETAILED DESCRIPTION OFiPREFERRED EMBODIMENTS 

j 

Definitions j 

Unless defined otherwise, all technical and scientific terms used 

I 

herein have the same meaning as is commonly understood by one of skill 
in the art to which this invention belongs. All patents and publications 
referred to herein are incorporated by reference. 

As used herein, a decision-support system, also referred to as a 

"data mining system" or a "knowledge discovery in data system", is any 

I 

system, typically a computer-based system, that can be trained on data to 
classify the input data and tljien subsequently used with new input data to 
make decisions based on the training data. These systems include, but 

are not limited, expert systems, fuzzy logic, non-linear regression 

i 

analysis, multivariate anaiysijs, decision tree classifiers, Bayesian belief 
networks and, as exemplified herein, neural networks. 

As used herein, an adaptive machine learning process refers to any 
system whereby data are used to generate a predictive solution. Such 
processes include those effejcted by expert systems, neural networks, and 
fuzzy logic. ' 

As used herein, exper-t system is a computer-based problem solving 
and decision-support systerrl based on knowledge of its task and logical 
rules or procedures for using the knowledge. Both the knowledge and the 
logic are entered Into the computer from the experience of human 
specialists In the area of expertise. 

As used herein, a neural network, or neural net, is a parallel 
computational model comprised of densely Interconnected adaptive 
processing elements. In the neural network, the processing elements are 

configured into an input layer, an output layer and at least one hidden 

1 

layer. Suitable neural networks are known to those of skill in this art 

I 

(see, e.g. , U.S. Patents 5,251,626; 5,473,537; and 5,331,550, Baxt 
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(1991) "Use of an Artificial t^eural Network for the Diagnosis of 

I 

Myocardial Infarction," Annals of Internal Medicine 115 :843: Baxt (1992) 
"Improving the Accuracy of an Artificial Neural Network Using Multiple 
Differently Trained Networks," Neural Computation 4:772: Baxt (1992) 
"Analysis of the clinical varidtbles that drive decision in an artificial neural 
network trained to identify the presence of myocardial infarction," Annals 
of Emergency Medicine 21 :1439; and Baxt (1994) "Complexity, chaos 
and human physiology: the justification for non-linear neural 

computational analysis," Cancer Letters 77 :85), 

i 

As used herein, a processing element, which may also be known as 
a perceptron or an artificial neuron, is a computational unit which maps 
input data from a plurality of inputs into a single binary output in 
accordance with a transfer function. Each processing element has an 

input weight corresponding -to each input which is multiplied with the 

i 

signal received at that inputjto produce a weighted input value. The 
processing element sums the weighted inputs values of each of the inputs 
to generate a weighted sunri which is then compared to the threshold 
defined by the transfer function. 

As used herein, transfer function, also known as a threshold 
function or an activation fuijiction, is a mathematical function which 
creates a curve defining two distinct categories. Transfer functions may 
be linear, but, as used in neural networks, are more typically non-linear, 
including quadratic, polynomial, or sigmoid functions. 

As used herein, backpropogation, also known as backprop, is a 

! 

training method for neural rietworks for correcting errors between the 
target output and the actual output. The error signal is fed back through 
the processing layer of the Ineural network, causing changes in the 
weights of the processing elements to bring the actual output closer to 
the target output. \ 



-22- 




Available C< 



Cjpy 



• 



24727-801F 



As used herein, Quick'prop is a backpropogation method that was 
proposed, developed and reported by Fahlman ("Fast Learning Variations 
on Back-Propagation: An Empirical Study", Proceedings on the 1 988 



Connectionist Models Summer School . Pittsburgh, 1988, D. Touretzky, et 
a!. , eds., pp. 38-51, Morgan 'Kaufmann, San Mateo, CA; and, with 
Lebriere, "The Cascade-Correlation Learning Architecture", Advances in 
Neural Information Processing Systems 2 . (Denver. 1989), D. Touretzky, 
ed., pp. 524-32. Morgan Ka.ufmann, San Mateo, CA). 

As used herein, diagn|Osis refers to a predictive process in which 
the presence, absence, severity or course of treatment of a disease, 
disorder or other medical condition is assessed. For purposes herein, 
diagnosis will also include predictive processes for determining the 
outcome resulting from a treatment. 

As used herein, a patient or subject includes any mammals for 



whom diagnosis is contemplated. Humans are the preferred subjects. 

As used herein, biochlemical test data refers to the results of any 
analytical methods, which include, but are not limited to: immunoassays, 
bioassays, chromatography! data from monitors, and imagers; 
measurements and also includes data related to vital signs and body 
function, such as pulse rate, temperature, blood pressure, the results of, 
for example, EKG, ECG andj EEG, biorhythm monitors and other such 
information. The analysis dan assess for example, analytes, serum 
markers, antibodies, and other such material obtained from the patient 
through a sample. j 

As used herein, patient historical data refers to data obtained from 
a patient, such as by questionnaire format, but typically does not include 
biochemical test data as usted herein, except to the extent such data is 
historical, a desired solution is one that generates a number or result 
whereby a diagnosis of a disorder can be generated. 
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As used herein, wherein a training example includes the 
observation data for a single jdiagnosis, typically the observation data 
related to one patient. 

As used herein, the pa|rameters identified from patient historical 
5 data are herein termed obserjvation factors or values or variables. For 

example, patient data will include information with respect to individual 

i 

patient's smoking habits. The variable associated with that will be 
.smoking. 

As used herein, partition means to select a portion of the data, 
ldr '4uch as 80%, and use it fo| jtraining a neural net and to use the remaining 
portion as test data. Thus/ the network is trained on all but one portion 
of the data. The process can then be repeated and a second network 
trained. The process is repeated until all partitions are used as used as 
test data and training datal ; 
15 As used herein, the mjethod of training by partitioning the available 

data into a plurality of subsets is generally referred to as the "holdout 
method" of training. The ho'ldout method is particularly useful when the 
data available for network training is limited. 

As used herein, trainiijig refers to the process in which input data 

i 

20 are used to generate a decisiion-support system. In particularly, with 
reference to neural nets, training is a trial-and-error process involving a 
series of iterative adjustmenjts to the processing element weights so that 
a particular processing element provides an output which, when 
combined with the outputs of other processing elements, generates a 

25 result which minimizes the r|esulting error between the outputs of the 
. neural network and the desired outputs as represented in the training 
data. 
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As used herein, a variable selection process is a systematic method 
whereby combinations of variables that yield predictive results are 
selected from any available set. Selection is effected by maximizing 
predictive performance of silibsets such that addition of additional 
5 variables does not improve the result. The preferred methods provided 
herein advantageously pernjiit selection of variables without considering all 
possible combinations. j 

As used herein, a candidate variable is a selected item from 

i 

collected observations fronri a group of test patients for the diagnostic 
10 embodiments or other reco|rds, such as financial records, that can be used 

with the decision-support system. Candidate variables will be obtained by 

collecting data, such as patient data, and categorizing the observations as 

a set of variables. 

As used herein, important selected variables refer to variables that 
15 enhance the network perfbrmance of the task at hand. Inclusion of all 

available variables does not result in the optimal neural network; some 

variables, when included in network training, lower the network 

i 

performance. Networks that are trained only with relevant parameters 
result in increased network performance. These variables are also 
20 referred to herein as a subset of relevant variables. 

As used herein, rarjiking refers to a process in which variables are 
listed in an order for selection. Ranking may be arbitrary or, preferably, is 

i 

ordered. Ordering may bie effected, for example, by a statistical analysis 
that ranks the variables i'n order of importance with respect to the task, 
25 such as diagnosis, or byja decision-support system based analysis. 

Ranking may also be effected, for example, by human experts, by rule 

based systems, or any combination of any of these methods. 

I 
I 

I 
\ 
I 
I 
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As used herein, a consensus of neural networks refers to the linear 

i 

connbination of outputs from|a plurality of neural networks where the 
weight on each is outputs isjdetermined arbitrarily or set to an equal 

value. ! 

i 

) As used herein, a greedy algorithm is a method for optimizing a 

data set by determining whether to include or exclude a point from a 
given data set. The set begins with no elements and sequentially selects 
an element from the feasibl^ set of remaining elements by myopic 
optimization, in which, given any partial solution, another value that 

) improves the object the most is selected. 

As used herein, a genjetic algorithm is a method that begins with an 

initial population of randomly generated neural networks which are run 

I 

through a training cycle and ranked according to their performance in 
reaching the desired target. | The poor-performing networks are removed 

i from the population, with tile fitter networks being retained and selected 
for the crossover process to offspring that retain the desirable 

^■^characteristics of the parent networks. 

As used herein, performance of a system is said to be improved or 

^ -higher when the results more accurately predict or determine a particular 

5 outcome. It is also to be uriderslood that the performance of a system 

i 

raining examples are used. Thus, the 
time as they are used and more patient 



will typically be better as nriore 1r 
systems herein will improve ove' 

data is accumulated and thien aqded to the systems as training data. 

I I 

As used herein, sensitivity = TP/(TP + FN); specificity is 
3 TN/(TN + FP), where TP = trjUe positives; TN=true negatives; FP = false 
positives; and FN = false negative. Clinical sensitivity measures how well 
a test detects patients witlli the disease; clinical specificity measures how 
well a test correctly identifies those patients who do not have the 
disease. 
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As used herein, positive predictive value (PPV) is TP/(TP-hFP); and 

negative predictive value (NPlv) is TN/(TN + FN). Positive predictive value 

i 

is the likelihood that a patien]t with a positive test actually has the 
disease, and negative predictive value is the likelihood that a patient with 
a negative test result does not have the disease. 

As used herein, fuzzy logic is an approach to deal with systems 
that cannot be described precisely. Membership functions (membership 
in a data set) are not binary in fuzzy logic systems; instead membership 
function may take on fractiorpal values. Therefore, an element can be 
simultaneously contained in two contradictory sets, albeit with different 
coefficients of set membersh'ip. Thus, this type of approach is useful for 
answering questions in which there is no yes or answer. Thus, this type 
of logic is suitable for categorizing responses from patient historical 
questionnaires, in which the janswer is often one of degree. 
1. General considerations! and general methodology 

It has been determined that a number of techniques can be used to 
train neural networks for analyzing observation values such as patient 

i 

history and/or biochemical information. Depending upon the 
characteristics of the available data and the problem to be analyzed, 
different neural network trairiing techniques can be used. For example, 
where large amounts of training inputs are available, methodology may be 

\adopted to eliminate redundant training information. 

As shown herein, neural netv}orks may also reveal that certain 

"input factors that were not irpitially ibonsidered to be important can 
influence an outcome, as well as riveal that presumably important factors 
are not outcome determinative. The ability of neural networks to reveal 



the relevant and irrelevant in 
design of a diagnostic test 



put factors permit their use in guiding the 
As sfpown herein, neural networks, and other 
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such data mining tools, are a value 



bte advance in diagnostics, providing 
an opportunity to increase the sensitivity and specificity of a diagnostic 
test. As shown herein, carejnnust be taken to avoid the potential of poor- 
accuracy answer due to the phenomenon of local minima. The methods 

herein provide a means to avoid this problem or at least minimize it. 

I * • 

In developing the developing diagnostic procedures, and in 
particular diagnostic tests that are based solely or in part on patient 
information, a number of problems have been solved. For example, there 
is generally a limited amount; of data because there is a limited number of 

patients where training datajare available. To solve this, as described 

I 

below, the patient informatipn is partitioned when training the network. 
Also, there is generally a large number of input observation factors 
available for use in connection with the available data, so methods for 
ranking and selecting observations were developed. 

Also, there are generally large number of binary (true/false) input 

i 

factors in the available patient data, but these factors are generally sparse 
in nature (values that are positive or negative in only a small percentage 
of cases of the binary input .'factors in the available patient data). Also 
there is a high degree of overlap between the positive and negative 
factors of the condition being diagnosed. 

These characteristics jand others impact the choice of procedures 
and methods used to develcjp a diagnostic test. These problems are 
addressed and solved herein. 

2- Development of patient history diagnostic test 
Diagnostic test 

Methods for diagnosis based solely on patient history data are 
provided. As demonstrated! herein, it is possible to provide decision- 
support system that rely only on patient history information but that aid in 
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I 
I 

diagnosis. Consequently, the resulting systems can then be used to 
improve the predictive ability of biochemical test data, to identify new 
disease markers, to develop biochemical tests, to identify tests that 
heretofore were not thought to be predictive of a particular disorder. 

The methods may also| be used to select an appropriate course of 

treatment by predicting the result of selected course of treatment and to 

I 

predict status following therapy. The input variables for training would be 
derived from, for example, electronic patient records, that indicate 

diagnoses and other available data, including selected treatments and 

t 

outcomes. The resulting decision-support system would then be used 
with all available data to, for jexample, categorize women into different 
classes that will respond to different treatments and predict the outcome 
of a particular treatment. This permits selection of a treatment or 
protocol most likely to be successful. 

Similarly, the systems can be used to identify new utilities for 
drugs or therapies and also to identify uses for particular drugs and 
therapies. For example, the systems can be used to select 
subpopulations of patients for whom a particular drug or therapy is 
effective. Thus, methods for| expanding the indication for a drug or 
therapy and identifying newidrugs and therapies are provided. 

Collection of patient data, generation of variables and overview 
To exemplify the methods herein. Fig. 1 sets forth a flow chart for 
developing a patient-history-biased diagnostic test process. The process 
begins with collection of patient history data (Step A). Patient history 
data or observation values are obtained from patient questionnaires, 
clinical results, in some instances diagnostic test results, and patient 



medical records and supplied |in computer-readable form to a system 
operating on a computer. In "the digital computer, the patient history 
are categorized into a set of variables of two forms: binary (such as 
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I 



t 

I 
I 

true/false) values and quanti|tative (continuous) values. A binary-valued 
variable might include the answer to the question, "Do you smoke?" A 
quantitative-valued variable might be the answer to the question, "How 
many packs per day do youj smoke?" Other values, such as membership 
functions, may also be useful as input vehicles. 

The patient history data will also include a target or desired 
outcome variable that would be assumed to be indicative of the presence, 
absence, or severity of the medical condition to be diagnosed. This 
desired outcome information is useful for neural network training. The 
selection of data to be included in the training data can be made with the 
knowledge or assumption ojf the presence, severity, or absence of the 
medical condition to be diagnosed. As noted herein, diagnosis may also 
include assessment of the ||)rogress and/or effectiveness of a therapeutic 
treatment. i 

The number of variables, which can be defined and thus generated, 
can be unwieldy. Binary variables are typically sparse in that the number 
of positive (or negative) responses is often a small percentage of the 
overall number of responses. Thus, in instances in which there is a large 

number of variables and a small number of patient cases available in a 

i 

typical training data environment, steps are taken to isolate from the 

t 

available variables a subset of variables important to the diagnosis (Step 

1 

B). The specific choice ofjthe subset of variables from among the 
available variables will affect the diagnostic performance of the neural 
network. [ 

The process outlined herein has been found to produce a subset of 
variables which is comparable or superior in sensitivity and reliability to 
the subset of variables typically chosen by a trained human expert, such 

as a physician. In some instances, the variables are prioritized or placed 

I 

in order of rank or relevance. 
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Thereafter, the final neural networks to be used in the diagnostic 
procedure are trained (Step G). In preferred embodiments, a consensus 

i 

( i.e. a plurality) of networks are trained. The resulting networks form the 
decision-support functionality for the completed patient history diagnostic 

test (Step D). i 

I 

Method for isolation of important variables 

A method for isoiationjof important variables is provided herein. 
The method permits sets of effective variables to be selected without 
comparing every possible corjnbination of variables. The important 

variables may be used as the! inputs for the decision-support systems. 

I 

Isolation of Important br relevant variables -ranking the 
variables \ 

Figures 3A-B provide a flow qfiart of the process for isolating the 

E) within a diagnostic test. Such a 

3 digital computer system to which 

3en provided. This procedure ranks 



ranking. As noted above, other rank 



skill in the art in place of chi |square or sensitivity analysis. Also, if 



X is set to N (the total number of car 

i 

be arbitrary. [ 

The system trains a plurality Ojf 



important or relevant variables (Step 

i 

process is typically conductejd using 
potentially relevant information has b 
the variables in order of importance using two independent methods, then 
selects a subset of the avail Jble variables from the uppermost of the 



ng methods can be used by those of 



didate variables), then ranking can 



data (Step I), as explained hereinafter, then generates a sensitivity analy 
sis over all trained networks to deteimine to what extent each input vari 



I 



able was used in the network to per 
sensus sensitivity analysis of, each in 



aging the individual sensitivity analysis results for each of the networks 



trained. Based upon sensitivity, a rc 
available from the patient history inf 
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orm the diagnosis (Step J). A con- 
put variable is determined by aver- 



nking order for each of the variables 
)rmation is determined (Step K). 
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Ranking the variables 

In preferred embodiments, the variables are ranked using a 
statistical analysis, such as la chi square analysis, and/or a decision- 
support system-based analysis, such as a sensitivity analysis. A 
sensitivity analysis and chi square analysis are used, in the exemplary 
embodiment to rank variabljes. Other statistical methods and/or decision- 
support system-based , including but not limited to regression analysis, 
discriminant analysis and ojther methods known to those of skill in the art, 
may be used. The ranked Ivariables may be used to train the networks, 
or, preferably, used in the;method of variable selection provided herein. 

The method employis a sensitivity analysis in which each input is 
varied and the correspondjing change in output Is measured (see, also, 
Modai, et aL, (1993) "Clirjical Decisions for Psychiatric Inpatients and 

Their Evaluation by Trained Neural Networks", Methods of Information in 

t 

Medicine 32 :396-99; Wilcling et aL (1994) "Application of 

I 

Backpropogation Neural Nletworks to Diagnosis of Breast and Ovarian 
Cancer". Cancer Letters 77 :145-53; Ruck et aL (1 990) "Feature Selection 
in Feed-Forward Neural Networks", Neural Network Computing 20:40-48; 
and Utans, et aL (1993) ['Selecting Neural Network Architectures Via the 
Prediction Risk: Application to Corporate Bond Rating Prediction"; 

i 

Proceedings of the First International Conference on Artificial Intelligence 
Applications on Wall Street. Washington. D.C. , IEEE Computer Society 
Press, pp. 35-41; Penny' et aL (1996) In "Neural Networks in Clinical 
Medicine", Medical Decision-support 4:386-398). Such methods, which 
have heretofore not been used to select important variables, as described 
herein. For example, sensitivity analysis has bee reported to be used to 
develop a statistical approach to determine the relationships between the 
variables, but not for selection of important variables (see, Baxt et aL 
(1995) "Bootstrapping jConfidence Intervals for Clinical Input Variable 

i 
j 
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Effects in a Network Trained- to Identify the Presence of Myocardial 

Infarction," Neural Computation 7: 624-38), Any such sensitivity 

I 

analyses may be used as described herein as part of the selection of 
important variables as an aid to diagnosis. 

Step K, Fig. 3A, provides an outline of the sensitivity analysis. Each 
network or a plurality of trained neural networks (networks N^ through NJ 
is run in the forward mode {'no training) for each training example 
(input data group for which jtrue output is known or suspected; there 
must be at least two training examples), where "x" is the number of 
training examples. The output of each network N^-N^ for each training 
example is recorded, i.e.] stored in memory. A new training example is 

defined containing the average value for each input variable within all 

t 

training examples. One at a time, each input variable within each original 
training example is replaced with its corresponding average value Vi^avg) 
through Vy^avg)/ where "y" isj the number of variables, and the modified 

training example S^' is again executed through the multiple networks to 

i 

produce a modified output for each network for each variable. The 

differences between the output from the original training example and 

j 

the modified output for each input variable are the squared and summed 

i 

(accumulated) to obtain individual sums corresponding to each input 
variable. To provide an illustration, for example, for 10 separate neural 
networks N^-Niq and 5 different training examples S^rSg, each having 15 
variables V^-V^^, each of tlie 5 training examples would be run through 
the 10 networks to produc'e 50 total outputs. Taking variable from 
each of the training examples, an average value Vi^avg) is calculated. This 
averaged variable V^^avg) is' substituted into each of the 5 training 

examples to create modified training examples S/^Sg' and they are again 

i 

run through the 10 networ^ks. Fifty modified output values are generated 
by the networks N^-Niq for the 5 training examples, the modification 
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being the result of using the average value variable Vi^a^gj. The difference 
between each of the fifty original and modified output values is 
calculated, i.e., the original output from training S4 in network N^: 
OUT(S4N6) is subtracted fronri the modified output from training example 
S4 in network N^, 0UT(S4'N6). That difference value is squared 
[OUT(S4'N6)-OUT{S4N6)]%i. This value is summed with the squared 
difference values for all combinations of networks and training examples 
for the iteration in which variable was substituted with its average 



1,^ value Vi^avg)^ Ljei^ 



I S rOUT(S' NJ-OUT(s!^Njt 
x=1 n=1 ! 



□ 10 

P 

m ! 

1^ Next, the process is repeated for variable #2, finding the differences 

Mi I 

i;n 15 between the original and modified outputs for each combination of 

p network and training example, squaring, then summing the differences. 

i"^ This process is repeated for each variable until all 1 5 variables have been 



completed. 

Each of the resultant sums is then normalized so that if all variables 

! 

20 contributed equally to the single resultant output, the normalized value 
would be 1.0. Following the jpreceding example, the summed squared 
differences for each variable are summed to obtain a total summed 
squared difference for all variables. The value for each variable is divided 
by the total summed square difference to normalize the contribution from 

25 each variable. From this information, the normalized value for each 
variable can be ranked in order of importance, with higher relative 

numbers indicating that the corresponding variable has a greater influence 

j 

on the output. The sensitivity analysis of the input variables is used to 
indicate which variables played the greatest roles in generating the 
30 network output. \ 
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It has been found herein that using consensus networks to perform 

i 

sensitivity analysis improves jthe variable selection process. For example, 
if two variables are highly correlated, a single neural network trained on 
the data might use only one of the two variables to produce the 
diagnosis. Since the variables are highly correlated, little Is gained by 
including both, and the choic'e of which to include is dependent on the 
initial starting conditions of tine network being trained. Sensitivity 
analysis using a single network might show that only one, or the other, is 
important. Sensitivity analysis derived from a consensus of multiple 
networks, each trained using different initial conditions, may reveal that 

both of the highly correlated variables are important. By averaging the 

I 

sensitivity analysis over a set of neural networks, a consensus is formed 
th^at minimizes the effects of the initial conditions. 
3^ l7 Chi-square contiiii]ency table 

When dealing with spah e binary data, a positive response on a 



given variable might be highly 

i 

diagnosed, but occur so infreij 



correlated to the condition being 
jently in the training data that the 



importance of the variable, as indicated by the neural network sensitivity 
analysis, might be very low. in\ order to catch these occurrences, the Chi- 
square contingency table is u^ed as a secondary ranking process. A 2X2 
contingency table Chi-square ^est on the binary variables, where each cell 



of the table is the observed frjeq 



variables (Fig. 3A, Step F) is performed. A 2X2 contingency table Chi- 



square test is performed on the 
thresholds (which might be ertif 



ency for the combination of the two 



continuous variables using optimal 
irically-determined) (Step G)> The binary 



1 

and continuous variables that have been based on Chi-square analysis are 



ranked (Step H). j 
The standard Chi-square 



2X2 contingency table operative on the 



binary variables (Step F) is used^to determine the significance of the 
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relationship between a speo'ific binary input variable and the desired 
output (as determined by comparing the training data with the known 
single output result). Variabiles that have a low Chi-square value are 
typically unrelated to the desired output. 

For variables that have continuous values, a 2X2 contingency table 
can be constructed (Step/G) by comparing the continuous variable to a 
threshold value. The threshold value is modified experimentally to yield 
the highest possible Chi-squajre value. 

The Chi-square values of the continuous variables and of the binary 

variables can then be combined for common ranking (Step H). A second 

level of ranking can thei be (performed that combines the Chi-square- 

ranked variables with the sensitivity-analysis-ranked variables (Step L). 

This combining of rankings allows variables that are significantly related 

to the output but that are sparse ( I.e . values that are positive or negative 

in only a small percentage of j cases) to be included in the set of Important 

variables. Otherwise, important information in such a non-linear system 

could easily be overlooked. 

Selection of important variables from among the ranked 
variables { 

As noted above, important variables are selected from among the 

i 

identified variables. Preferably the selection is effected after ranking the 
variables at which time a second level ranking process is invoked. A 
method for identification of irnportant variables (parameters) or sets 
thereof for use in the decisiorp-support systems is also provided, This 

method, while exemplified herein with reference to medical diagnosis, has 

I 

broad applicability in any field, such as financial analysis and other 
endeavors that involve statistically-based prediction, in which important 
parameters or variables are selected from among a plurality. 

In particular, a method jfor selecting effective combinations of 
variables is provided. After (1) providing a set of "n" candidate variables 

j -36- 




24727-80 IF 



#1 



Available Cc^py 




24727-801 F 



and a set of "selected important variables", which initially is empty; and 
(2) ranking all candidate variables based on a chi square and sensitivity 

analysis, as described abovej the method involves: (3) taking the highest 

I 

"m" ranked variables one at a time, where m is from 1 up to n, and 
evaluating each by training a consensus of neural nets on that variable 
combined with the current s^t of important variables; (4) selecting the 

best of the m variables, where the best variable is the one that most 

I 

improves performance, and if it improves performance, adding it to the 
"selected important variable" set, removing it from the candidate set and 
continuing processing at stepi (3) otherwise continuing by going to step 

(5); (5) if all variables on the (candidate set have been evaluated, the 

j 

process is complete, otherwise continue taking the next highest "m" 
ranked variables one at a time, and evaluating each by training a 

consensus of neural nets on llhat variable combined with the current set 

I 

of important selected variablels and performing step (4). 



particular, the seconjd level ranking process (Step L) starts by 



v/^adding the highest ranked variable from the sensitivity analysis (Step K) to 
the set of important variables! (Step H). Alternatively, the second level 
ranking process could be started vyith an empty set and then testing the 

of the two sets of ranking. This 
the network training procedure (Step I) 
subset of variables from the available 
^ks. The ranking process is a network 
training procedure using the c^urrent set of "important" variables (which 
generally will initially be emptV) plus the current variable being ranked or 
tested for ranking, and uses aj greedy algorithm to optimize the set of 
input variables by myopically optin izing the input set based upon the 
previously identified important variable{s), to identify the remaining 



top several (x) variables from jeach 
second level ranking process uses 
on a currently selected partition or 
data to train a set of neural netwo 



variable(s) which improve the joutptfjt the most 
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This training process is illustrated in Fig. 4. The number of inputs 
used by the neural network is controlled by excluding inputs which are 

i 

found to not contribute significantly to the desired output, i.e., the known 

I 

target output of the training data. A commercial computer program, such 

I 

as ThinksPro™ neural networks for Windows^" (or TrainDos™ the DOS 
version) by Logical Designs Gonsulting, Inc, La Jolla, California, or any 
other such program that one|of skill in the art can develop may be used to 
vary the inputs and train the .networks. 

A number of other con(imerciallj^ available neural network computer 
programs may be used to perform anv of the above operations, including 
Brainmaker*'", which is available from/ California Scientific Software Co., 
Nevada Adaptive Solutions, Beavertdn, OR; Neural Network Utillty/2^'", 
from NeuralWare, Inc., Pittsburgh, RA; NeuroSheir"" and Neuro- 
Windows*"", from Ward Systems Group, Inc., Frederick, MD. Other types 
of data mining tools, i.e. , decision-support systems, that will provide the 
function of variable selection and network optimization may be designed 
or other commercially available sysiems may be used. For example, 
NeuroGenetic Optimizer™ fronri BioComp Systems, Inc., Redmond, WA; 

rom New Wave Intelligent Business 

Df Singapore, use genetic algorithms 

I 

that are modelled on natural seledion to eliminate poor-performing nodes 
within network population while p issing on the best performing rates to 
offspring nodes to "grow" an optimized network and to eliminate input 
variables which do not contribute significantly to the outcome. Networks 
based on genetic algorithms use mutation to avoid trapping in local 
minima and use crossover prqcesses to introduce new structures into the 
population. 

i 

Knowledge discovery in date (KDD) is another data mining tool. 



and Neuro Forecaster/GENETICA, 
Systems (NIBS) Pte Ltd., Republic 



decision-support system, designed 



to identify significant relationship is 
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that exist among variables, and ar^ useful when there are many possible 
relationships. A number of IfDD sv stems are commercially available 
including Darwin*"", from Thinking Machines, Bedford, MA; Mineset*"", 
from Silicon Graphics, Mountain Vi(5w, CA, and Eikoplex*'" from Ultragem 
Data Mining Company, San Francisco, CA. (Eikoplex*'^ has been used to 
provide classification rules for determining the probability of the presence 
of heart disease.) Others may be developed by those of skill in the art. 

procedure, if, for example, x is set to 
each of the two ranking sets will be 



Proceeding with the ranking 



2, then the top two variables from 
tested by the process (Fig. 3A, St^ps L, S), and results are checked to 
see if the test results show improx ement (Step T). If there is an 
improvement, the single bes,t performing variable is added to the set of 
"important" variables, and then that variable Is removed from the two 
rankings (Fig. 3B, Step U) for further testing (Step S). If there Is no 



Improvement, then the process is 



repeated with the next x variables from 



each set until an improvement is f Dund or all of the variables from the two 



sets have been tested. This proce 
sets are empty, i.e., all relevant or 



number of subsets of the available 

i 

order to determine the set of impo 



with ten available variables,, the process would test only 34 subsets 



ss is repeated until either the source 
important variables have been included 



In the final network, or all of the remaining variables in the sets being 

tested are found to be below the f erformance of the current list of 

i 

Important variables. This prbcess cif elimination greatly reduces the 



variables which must be tested in 
tant variables. Even In the worst case. 



where x = 2 and only 19 subsets o 



F the 1024 possible combinations if 



x = 1 . Thus, where there arje 100 available variables, only 394 subsets 
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would be tested where x = 2i The var iables from the network with the 
best test performance are thus identified for use (Fig, 3B, Step V). 

Then the final set of networks is trained to perform the diagnosis 
(Fig. 4, Steps M, N, Q, R). Typically, a number of final neural networks 
are trained to perform the di^ignosis. It is this set of neural networks (a 

that can form the basis of a deliverable product to the end user. Since 

I 

different initial conditions (initial weights) can produce differing outputs 

for a given network, it is useful to seek a consensus. (The different initial 

I 

weights are used to avoid error from trapping in local minima.) The 
consensus is formed by averjaging the outputs of each of the trained 
networks which then becom^es the single output of the diagnostic test. 

Training a consensus of networks 

Fig. 4 illustrates the procedure for the training of a consensus of 

I 

neural networks. It is first determined whether the current training cycle 
is the final training step (Step M). If yes, then all available data are 

placed into the training datajset (i.e., P= 1 ) (Step N). If no, then the 

I 

available data are divided into P equal-sized partitions, randomly selecting 

i 

the data for each partition (Step O). In an exemplary embodiment, for 
example five partitions, e.g. . PrPs, are created from the full set of 
available training data. Then two constructions are undertaken (Step P). 
First, one or more of the parjtitions are copied to a test file and the 

i 

remaining partitions are copied to a training file. Continuing the 
exemplary embodiment of five partitions, one of the partitions, e.g., P^,, 
representing 20% of the totjal data set, is copied to the test file. The 
remaining four files, P2-P4. are identified as training data. A group of N 
neural networks is trained using the training partitions, each network 
having different starting weights (Step Q). Thus, in the exemplary 

embodiment, there will be 20 networks (N = 20) with starting weights 

I 

selected randomly using 20 .different random number seeds. Following 
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completion of training for each of the 20 networks, the output values of 



is then run through the trained networks to provide an estimate of the 
performance of the trained networks. The performance is typically 
determined as the mean squared error of prediction, or misclassification 
rate. A final performance es.^imate is generated by averaging the 
individual performance estimates of each network to produce a completed 
consensus network (Step R)i This method of training by partitioning the 
available data into a plurality! of subsets is generally referred to as the 
"holdout method" of training. The holdout method is particularly useful 
when the data available for network training is limited. 

Test set performance |can be empirically maximized by performing 
various experiments that identify network parameters that maximize test 
set performance. The pararneters that can be modified In this set of 
experiments are 1 ) the numtier of hidden processing elements, 2) the 
amount of noise added to the inputs, 3) the amount of error tolerance, 4) 



the choice of learning algorithm, 5) the amount of weights decay, and 6) 
the number of variables. A complete search of all possible combinations 
Is typically not practical, due to the amount of processing time that Is 
required. Accordingly, test networks are trained with training parameters 
chosen empirically via a cornputer program, such as ThinksPro™ or a user 
developed program, or fromlthe results of existing test results generated 



by others who are working In the field of interest. Once a "best" 
configuration is determined,! a final set of networks can be trained on the 
complete data set. j 



all 20 networks are averaged to provide the average performance on the 
test data for the trained networks. The data in the test file (partition P^) 
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3. Development of biochemical diagnostic test 

A similar technique for isolating variables may be used to build or 
validate a biochemical diagnostic test, and also to combine a biochemical 
diagnostic test data with thej patient history diagnostic test to enhance 
the reliability of a medical diagnosis. 

The selected biochemipal test can include any test from which 

i 

useful diagnostic information may be obtained in association with a 
patient and/or patient's condition. The test can be instrument or non- 
instrument based and can include the analysis of a biological specimen, a 

i 

patient symptom, a patient ii|idication, a patient status, and/or any change 
in these factors. Any of a number of analytical methods can be employed 
and can include, but are notjlimited to, immunoassays, bioassays, 
chromatography, monitors, and imagers. The analysis can assess 
analytes, serum markers, antibodies, and the like obtained from the 

i 

patient through a sample. Further, information concerning the patient can 
be supplied in conjunction with the test. Such information includes, but 
is not limited to, age, weight, blood pressure, genetic history, and the 
other such parameters or variables. 

The exemplary biochemical test developed in this embodiment 
employs a standardized testjformat, such as the Enzyme Linked 

! 

Immunosorbent Assay or ELISA test, although the information provided 
herein may apply to the development of other biochemical or diagnostic 

tests and Is not limited to thfe development of an ELISA test (see, e.g. , 

i 

Molecular Immunology: A Textbook , edited by AtassI et aL Marcel 

! 

Dekker Inc., New York and Basel 1984, for a description of ELISA tests). 
Information important to thei development of the ELISA test can be found 

in the Western Blot test, a test format that determines antibody reactivity 

I 

to proteins in order to characterize antibody profiles and extract their 
properties. 
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A Western Blot is a technique used to identify, for example, 
particular antigens in a mixture by separating these antigens on 
polyacrylamide gels, blotting onto nitrocellulose, and detecting with 
labeled antibodies as probes; (See, for example, Basic and Clinical 
Immunology ^ Seventh Edition, edited by Stites and Terr, Appleton and 
Large 1991, for information on Western Blots.) It is, however, sometimes 
undesirable to employ the V^estern Blot test as a diagnostic tool. If 

instead, ranges of molecular! weight that contain relevant information to 

i 

the diagnosis can be pre-idehtified then this information can be "coded" 

i 

into an equivalent ELISA test. 

In this example, the development of an effective biochemical 
diagnostic test is dependent^ upon the availability of Western Blot data for 
the patients for which the disease condition is known or suspected. 

Referring to Fig. 5, Western! Blot data are used as a source (Step W), and 

I 

the first step in processing the Western Blot data are to pre-process the 
Western Blot data for use by the neural network (Step X). Images are 

digitized and converted to fixed dimension training records by using a 

i 

computer to perform the spline interpolation and image normalization. It 
is necessary to align images! on a given gel based only on information in 
the image in order to use data from multiple Western Blot tests. Each 
Input of a neural network needs to represent a specific molecular weight 
or range of molecular weights accurately. Normally, each gel produced 
contains a standards image for calibration, wherein the proteins contained 
are of a known molecular w;elght, so that the standards image can also be 
used for alignment of imaged contained within the same Western Blot. 
For example, a standard curve can be used to estimate the molecular 
weight range of other images on the same Western Blot and thereby align 
the nitrocellulose strips. 

! 
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The process for alignrhent of images is cubic spline interpolation. 
This is a method which guarantees smooth transitions at the data points 
represented by the standards. To avoid possible performance problems 
due to extrapolation, termination conditions are set so that extrapolation 
is linear. This alignment step minimizes the variations in the estimates of 
molecular weight for a given band on the output of the Western Blot. 

The resultant scanned image is then processed to normalize the 
density of the image by scaling the density so that the darkest band has a 
scaled density of 1 .0 and the lightest band is scaled to 0.0. The image is 
then processed into a fixed length vector of numbers which become the 
inputs to a neural network, ^hich at the outset must be trained as 
hereinafter explained. j 

A training example Is built in a process similar to that previously 
described where the results ^generated from the processing. of the Western 
Blot data are trained (Step Y). To minimize the recognized problems of 
dependency on starting weights, redundancy among interdependent 
variables, and desensitivity resulting from overtraining a network, it is 
helpful to train a set of neural networks (consensus) on the data by the 
partitioning method discussed previously. 

From the sensitivity analysis of the training runs on the processed 
Western Blot data, regions of significantly contributing molecular weights 

(MW) can be determined anb identified (Step AA). As part of the 

I 

isolation step, inputs in contiguous regions are preferably combined into 
"bins" as long as the sign of the correlation between the input and the 
desired output is the same. This process reduces the typical 100-plus 
inputs produced by the Western Blot, plus the other inputs, to a much 
more manageable number o^ inputs of less than about twenty. 

In a particular embodiment, it may be found that several ranges of 

molecular weight may correlate with the desired output, indicative of the 

j 
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condition being diagnosed. A correlation may be either positive or 
negative. A reduced input representation may be produced by using a 
Gaussian region centered onjeach of the peaks found in the Western Blot 

training, with a standard deviation determined so that the value of the 

i 

5 Gaussian was below 0.5 at the edges of the region. 

In a specific embodiment, the basic operation to generate the 

I 

neural network input is to perform a convolution between the Gaussian 
and the Western Blot image, j using the log of the molecular weight for 
calculation. I 
•'^ 10 The data may be tested using the holdout method, as previously 



described. For example, five partitions might be used where, in each 



W 

\j\ partition, 80% of the data are used for training and 20% of the data are 

" used for testing. The data are shuffled so that each of the partitions Is 

i 

H likely to have examples frorn each of the gels. 

j 

jW. 15 Once the molecular weight regions important to diagnosis have 

been identified (Step AA), orjie or more tests for the selected region or 
regions of molecular weight may be built (Step AB). The ELISA 
biochemical test is one exarfiple. The selected region or regions of 
molecular weight identified ^s important to the diagnosis may then be 
20 physically identified and used as a component of the ELISA biochemical 
test. Whereas regions of the same correlation sign may, or may not, be 
combined into a single ELISA test, regions of differing correlation signs 

should not be combined into a single test. The value of such a 

i 

biochemical test may then be determined by comparing the biochemical 

25 test result with the known or suspected medical condition. 

1 

In this example, the development of a biochemical diagnostic test 
may be enhanced by combining patient data and biochemical data in a 
process shown in Fig. 2. Under these conditions, the patient history 
diagnostic test is the basis flor the biochemical diagnostic test. As 
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explained herein, the variables that are identified as important variables 

I 

are connbined with data derived from the Western Blot data in order to 
train a set of neural networks to be used to identify molecular weight 

i 

regions that are important to| a diagnosis. 

Referring to Fig. 2, Western Blot data are used as a source (Step 
W) and pre-processed for use by the neural network as described 
previously (Step X). A training example is built in a process similar to 
that previously described whierein the important variables from the patient 
history data and the results generated from the processing of the Western 
Blot data are combined and are trained using the combined data (Step Y). 
In parallel, networks are trained on patient history data, as described 
above (Step Z). | 

To minimize the recognized problems of dependency on starting 
weights, redundancy amongj interdependent variables, and desensitivity 
resulting from overtraining a network, it was found that it was preferable 
to train a set of neural networks (consensus set) on the data by the 
partitioning method. From the sensitivity analysis of the training runs on 
patient history data alone and on combined data, regions of significantly 

contributing molecular weights can be determined and identified as 

i 

previously described (Step XA). As a further step in the isolation 

I 

process, a set of networks is thereafter trained using as inputs the 
combined patient history and bin information in order to isolate the 
important bins for the Western Blot data. The "important bins" represent 
the important regions of molecular weight related to the diagnosis 
considering the contribution j of patient history information. These bins are 
either positively or negatively correlated with the desired output of the 
diagnosis. j 

Once the molecular weight regions important to diagnosis have 
been identified (Step AA), one or more tests for the selected region or 
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regions may be built and validated as previously described (Step AB). 
The designed ELISA tests are then produced and used to generate ELISA 
data for each patient in the database (Step AC). Using ELISA data and 
the important patient history data as input, a set of networks is trained 
using the partition approach as described above (Step AE). The partition 
approach can be used to obtain an estimate of the lower bound of the 

biochemical test. The final training (Step AE) of a set of networks, i.e., 

i 

the networks to be used as a deliverable product, is made using all 

I 

available data as part of the|training data. If desired, new data may be 
used to validate the performance of the diagnostic test (Step AF). The 
performance on all the trainihg data becomes the upper bound on the 
performance estimate for the biochemical test. The consensus of the 
networks represents the intended diagnostic test output (AG). This final 
set of neural networks can then be used for diagnosis. 
4. Improvement of neural network performance 

An important feature of the decision-support systems, as 
exemplified with the neural networks, and methods provided herein is the 
ability to improve performanjce. The training methodology outlined above 
may be repeated as more information becomes available. During 
operation, all input and outp,ut variables are recorded and augment the 
training data in future training sessions. In this way, the diagnostic neural 
network may adapt to individual populations and to gradual changes in 
population characteristics. \ 

If the trained neural network is contained within an apparatus that 

allows the user to enter the Required information and outputs to the user 

I 

the neural network score, then the process of improving performance 
through use may be automated. Each entry and corresponding output is 
retained in memory. Since the steps for retraining the network can be 



i 

i -47- 



# Available Ccfpy 
24727-801 F 



encoded into the apparatus, 'the network can be re-trained at any time 
with data that are specific to the population. 

i 

5. Method for evaluating! the effectiveness of a diagnostic test course 
of treatmentt | I 

Typically, the effectiveness or usefulness of a diagnostic test is 
determined by comparing the diagnostic test result with the patient 
medical condition that is either kriown or suspected. A diagnostic test is 
considered to be of value if there is good correlation between the 
diagnostic test result and the patient medical condition; the better the 
correlation between the diagnostic test result and the patient medical 
condition, the higher the value placed on the effectiveness of the 
diagnostic test. In the absencelof such a correlation, a diagnostic test is 
considered to be of lesser value. The systems provided herein, provide a 
means to assess the effectiveness of a biochemical test by determining 
whether the variable that cojrresponds to that test is an important selected 
variable. Any test that yields data that improves the performance of the 



system is identified. 



I 



A method by which the effectiveness of a diagnostic test may be 
determined, independent of 'the correlation between the diagnostic test 

result and the patient medical condition (Fig. 6) is described below. A 

I 

similar method may be used to assess the effectiveness of a particular 
treatment. 

In one embodiment, the method compares the performance of a 
patient history diagnostic neural network trained on patient data alone, 
with the performance of a Jombined neural network trained on the 
combination of patient historical data and biochemical test data, such as 
ELISA data. Patient historyldata are used to isolate important variables 
for the diagnosis {Step AH)j and final neural networks are trained (Step 
AJ), all as previously described. In parallel, biochemical test results are 
provided for all or a subset 'of the patients for whom the patient data are 

i 
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known (Step AK), and a diagnostic neural network is trained on the 

I 

combined patient and biochemical data by first isolating important 

I 

variables for the diagnosis (Step AL), and subsequently training the final 
neural networks (Step AM), all as previously described. 

The performance of the patient history diagnostic neural network 
derived from Step AJ is then compared with the performance of the 
combined diagnostic neural network derived from Step AM, in Step AN. 

The performance of a diagnostic neural network may be measured by any 

I 

number of means. In one example, the correlations between each 



3 10 diagnostic neural network output to the known or suspected medical 



ly condition of the patient are compared. Performance can then be 



measured as a function of tljiis correlation. There are many other ways to 



m measure performance. In this example, any increase in the performance of 



P the diagnostic neural network derived from Step AM over that derived 

j 

i,^ 15 from Step A J is used as a rneasure of the effectiveness of the 
^"^ biochemical test. ' 

m ! 

ry A biochemical test inlthis example, and any diagnostic test in 

general, that lacks sufficient correlation between that test result and the 

i 

known or suspected medical condition, is traditionally considered to be of 

20 limited utility. Such a test may be shown to have some use through the 

I 

method described above, thereby enhancing the effectiveness of that test 
which otherwise might be considered uninformative. The method 
described herein serves twd functions: it provides a means of evaluating 
the usefulness of a diagnostic test, and also provides a means of 
25 enhancing the effectiveness of a diagnostic test. 
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6. Application of the methods to identification of variables for 
diagnosis and development of diagnostic tests 

The methods and networks provided herein provide a means to, for 

i 

example, identify important >lrariables, improve upon existing biochemical 
tests, develop new tests, assess therapeutic progress, and identify new 
disease markers. To demonstrate these advantages, the methods 
provided have been applied to endometriosis and to pregnancy related 
events, such as the likelihood of labor and delivery during a particular 
eriod. 

0 ^(jcO^ Endometriosis 

The methods described herein, have provided a means to develop a 
non-invasive methodology for the diagnosis of endometriosis. In addition, 
the methods herein provide means to develop biochemical tests that 
provide data indicative of endometriosis, and also to identify and develop 
5 new biochemical tests. i 

The methodology for jvariable selection and use of decision-support 
systems, has been applied tp endometriosis. A decision-support system, 
in this instance, a consensus of neural networks, has been developed for 
the diagnosis of enciometrioisis. In the course of this development, which 
!0 is detailed in the EXAMPLES, it was found that it was possible to develop 
neural networks capable of aiding in the diagnosis of endometriosis that 



!5 



to 



only rely on patient 



listorical data, i.e. , data that can be obtained from a 



patient by questionnaire format. It was found that biochemical test data 
could be used to enhance the performance of a particular network, but it 



was not essential to 
protocol and neural 
can be inputted into 



its value as a diagnostic tool. The variable selection 

nets provide a means to select sets of variables that 

the decision-support system to provide a means to 

diagnose endometriosis. While some of the identified variables, include 

those that have traditionally been associated with endometriosis, others 

of the variables have not. In addition, as noted above, variables, such as 

I I 
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pelvic pain and dysmenorrheja th^t have been associated vvith 
endonnetriosis are not linearly correlated with it to permit diagnosis. 



Exemplary decision-suppdrt system are described in the Example. 



For example, one neural net. 
Example 14. Comparison of 



designated pat07 herein, is described in 



the I 



output of the pat07 network output with 
the probability of having endorrjetriosis yields a positive correlation (see 
Table 1). The pat07 network can predict the likelihood of a woman having 
endometriosis based on her pa :07 score. For example, if a woman has a 
pat07 score of 0.6, then she has a 90% probability of having 
10 endometriosis; if her pat07 sec re is 0.4, then she has a 10% probability 



of having endometriosis. The 
applied to the database was 
output values can range frorh 



jynamic range of pat07 output when 
out 0.3 to about 0.7. Theoretically, the 
to 1, but values below 0.3 or above 0.7 



were not observed. Over SOd women have been evaluated using the 
pat07 network, and its perf Jrijnance can be summarized as follows: 

TABLE 1 



Pat07 Score 



Endometriosis (% of Total) 



< 0.40 I 
0.40 - 0.45 I 
0.45 - 0.55 I 
0.55 - 0.60 i 

> 0.60 



10 
30 
50 
70 
90 



The pat07 network scpre is interpreted as the likelihood of having 
endometriosis, and not whether or not a woman is diagnosed with 

i 

endometriosis. The likelihood is based on the relative incidence of 



endometriosis found in each 
women with pat07 network 



score group. For example, in the group of 
score of 0.6 or greater, 90% of these women 
had endometriosis, and 10%j of these women did not. This likelihood 
relates to the population of yvomen at infertility clinics, 
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Software programs have been developed that contain the pat07 network. 

One program, referred to as adezacrf.exe, provides a single screen 
windows interface that allovys the user to obtain the pat07 network score 

for a woman. User enters vsilues for all 14 variables, pat07 network score 

i 

5 is calculated following everyj keystroke. Another program, designated 
adzcrf2.exe, is almost exactly the same as adezacrf.exe, except that it 
allows for one additional input: the value of an ELISA test. 
This program and network is a specific example of a method of expanding 
j!^ clinical utility of a diagnostic! test. The ELISA test results did not correlate 

O 10 with endometriosis. By itself, the ELISA test does not have clinical utility. 

:.p j 

LU As another input parameter,; the ELISA test improved network 

i 

in performance, so that one may assert that incorporating the ELISA result 

-'^ as an input for network analysis expanded the clinical utility of that ELISA 

p test. Another program (provided herein in the Appendix II, designated 

15 adzcrf2.exe, provides a multiple screen windows interface that allows the 
user to obtain the pat07 network score for a woman. The multiple data 



rU entry screens guides the user to enter all patient historical data, and not 



just those parameters required as inputs for pat07. Pat07 score is 
calculated after all data are entered and accepted as correct by the user. 

20 This program also saves thejdata entered in ^.fdb files, can import data, 

I 

calculate pat07 scores on imported data, and export data. The user can 
edit previously entered data J All three of the above programs serve as 
specific examples of the diagnostic software for endometriosis. 

Figure 1 1 illustrates ah exemplary interface screen used in the 
25 diagnostic software for endometriosis. The display 1 100, which is 

provided as a Microsoft Window^'^-type display, provides a template for 
entry of numerical values for each of the important variables which have 
been determined for diagnosis of endometriosis. Input of data to perform 
a test is accomplished usingia conventional keyboard alone, or in 



-52- 



Available C6py 

I 

I 



combination with a computer mouse, a trackball or joystick. For purposes 
of this description, a mouse/ikeyboard combination will be used. 

Each of the text boxe$ 1 101-1 106 is for entry of numerical values 
representative of the important variables Age (box 1 101); Number of 
Pregnancies (box 1102); Number of Births (box 1103); Number of 
Abortions (box 1 104); Number of Packs of Cigarettes Smoked per Day 
(box 1 105); and ELISA test results (box 1 106). To enter the subject 
patient's age, the user mov^s the mouse so that the pointer on the screen 
is in box 1 101, then clicks cjn that location. Entry of the number rep- 
resentative of the patient's age is done using the keyboard. The re- 
maining boxes are accessed! by pointing and clicking at the selected box. 

Boxes 1 107-1 115 arej important selected variables for which the 
data are binary, i.e., either "yes" or "no". The boxes and the variables 
are correlated as follows: ^ 



BOX 


Vy^RIABLE 


1107 


Past History of Endometriosis 


1108 


Dvjsmenorrhea 


1 109 


Hypertension During Pregnancy 


1110 


Pelvic Pain 


1111 


Abnormal PAP/Dysplasia 


1112 


History of Pelvic Surgery 


1 113 


Medication History 


.1114 


Genital Warts 


1115 


Diabetes 



A "yes" to any one of these variables can be indicated by pointing 
at the corresponding box and clicking the mouse button to indicate an 
"X" within the box. ' 

The network automatically processes the data after every 

keystroke, so changes will be seen in the output values displayed in text 

I 

boxes 1 1 1 8-1 1 20 after every entry into the template 1 1 00. Text box 
1118, labelled "Endo" provi^des consensus network output for the 
presence of endometriosis; text box 1119, labelled "No Endo" provides 
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consensus network output fbr the absence of endonnetriosis; and text box 
1 1 20 provides a relative scoire indicative of whether or not the patient has 
endonnetriosis. It is noted that the score in the text box 1 1 20 is an 
artificial number derived from boxes 1118 and 1119 that makes it easier 
for the physician to interpret results. As presently set, a value in this box 
in the positive range up to 25 is indicative of having endometriosis, and a 
value in the negative range down to -25 will be indicative of not having 
endometriosis. The selected transformations permits the physician to 
/ \readily interpret the pat07 output more readily. 
^9) I \ described in the ExaWiples, the pat07 is not the only network 

that is predictive of endometriosis. Other networks, designated patOS 
through pat23a have been developed. These are also predictive of 
endometriosis. All these netWorks perform very similarly, and can readily 
be used in place of pat07. thus, by following the methodology used to 



15 develop pat07, other similarly 



functioning neural nets can be and have 



been developed. PatOS and pat09 are the most similar to pat07: these 
networks were developed by following the protocol outlined above, and 
were allowed to select irnportant variables from the same set as that used 
for development of pat07. I 
20 It was found that thej initial weighting of variables can have effects 

on the outcome of the variable selection procedure, but not in the 

I 

ultimate diagnostic result. Pc t08 and patOS used the same database of 
patient data as pat07 to derive the disease relevant parameters. PatIO 
through pat23a were training runs originally designed to elucidate the 
25 importance of certain parannijters: history of endometriosis, history of 

pelvic surgery, dysmenorrhea and pelvic pain. For development of these, 

the importance of a variable was assessed by withholding that variable 

I 

from the variable selection process. It was found that the variable 
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selection process and training the 
perfornnance did not significantly 
Thus, although a partifcular 



final consensus networks, network 
deteriorate. 

variable, or set of variables, nnay have 



the absence of such variables do 
predict endometriosis. Thes^ resu 
of the methodology for variable s 
training;, and (2) the adaptability 



appeared to be significant in{ prec icting endometriosis, networks trained in 



not have a markedly reduced ability to 
ts to demonstrate (1) the effectiveness 
election and consensus network 
of networks in general. In the absence 



of one type of data, the network found other variable(s) from which to 

i 1 

extract that information. In the absence of one variable, the network 
selected different variables in its place and maintained performance. 

Patients suspected of! having endometriosis typically must undergo 
exploratory surgery to diagnose the disease. The ability to diagnose this 
disorder reliably using patierjit history information and optionally 
biochemical test data, such as western blot data, provides a highly 
desirable alternative to surgery. The methods herein and identified 
variables provide a means to do so. 

Data related to the diagnosis of the disease of endometriosis has 
been gathered. The data inpludes, patient history data, western blot data 

and ELISA data. Application of the methodology herein, as shown in the 

t 

EXAMPLES, demonstrated that patient history data alone can be 
predictive of endometriosis.' 

To assess the performance of the variable selection protocol and to 

1 

ascertain where the 14 variable network (pat07) ranked (in performance) 
compared to all possible combinations of the 14 variables, networks were 

trained on every possible combination of the variables (16,384 

j 

combinations). Also, the variable selection protocol was applied to the 
set of 14 variables. From among the 14, 5 variables were selected. 
These are pregnancy hypertension, number of births, abnormal 
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PAP/dysplasia, history of endometriosis and history of pelvic surgery. 
This combination ranked as the 68th best performing combination out the 
16,384 (99-6 percentile) possible combinations, thereby demonstrating 
the effectiveness of variabloj selection protocol. Also, the combination 
that includes all 14 variables! was ranked 718th out the 16,384 possible 

combinations (95.6 percentile). 

1 

These results also shc^w that subsets of the 14 variables are useful. 

I 

In particular, any subset of the selected set of parameters, particularly the 
set of fourteen variables, that contain one (or more) of the following 
combinations of three variables can be used with a decision-support 
system for diagnosis of endometriosis: 

a) number of births, history of endometriosis, history of pelvic 
surgery; ; 

b) diabetes, pregnancy hypertension, smoking; 

c) pregnancy hypertension, abnormal pap smear/dysplasia, 
history of endometriosis; ' 

d) age, smoking, history of endometriosis; 

e) smoking, history of endometriosis, dysmenorrhea; 

f) age, diabetes, history of endometriosis; 

g) pregnancy hypertension, number of births, history of 

i 

endometriosis; . 

h) Smoking, numb^er of births, history of endometriosis; 

I) pregnancy hypertension, history endometriosis, history of 
pelvic surgery; 

j) number of pregnancies, history of endometriosis, history of 
pelvic surgery; 

k) number of births, abnormal PAP smear/dysplasia, history of 
endometriosis; 
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I) number of births, abnormal PAP smear/dysplasia, 
dysmenorrhea; | 

m) history of endometriosis, history of pelvic surgery, 
dysmenorrhea; and j 
5 n) number of pregnancies, history of endometriosis, 

dysmenorrhea. , 

As shown in the exanpples, other sets of important selected 

I 

variables that perform similarly to the enumerated 14 variables can be 

1;^ obtained. Other smaller subsets thereof may be also be identified. 

□ 10 Predicting pregnancy related events, such as the likelihood of 

;P delivery within a particular time period 

l,y The methods herein rnay be applied to any disorder or condition, 

and are particularly suitable for conditions in which no diagnostic test can 
be adequately correlated or jfor which no biochemical test or convenient 
U 15 biochemical test is available! For example, the methods herein have been 

applied to predicting pregnancy related events, such as the likelihood of 
|;3 / \sdelivery within a particular time period. 

y ^ Determination of impendii/tg birth is of importance, example, for 



^ [he 



jjhcreasing neonatal survival |of infants born before 34 weeks. The 
20 presence of fetal fibronectin in/secretion samples from the vaginal cavity 
or the cervical canal from a pi/egnant patient after week 20 of pregnancy 
is associated with a risk of lapor and delivery before 34 weeks. Methods 
and kits for screening for fetial fibronectin in body fluids and tissues, 
particularly in secretion sarnrples from the vaginal cavity or the cervical 
25 canal, of a pregnant patient' after week 20 of pregnancy are available 
(see, U.S. Patent Nos. 5,51^,702, 5,468,619, and 5,281,522, and 
5,096,830; see, also U-S, patent Nos. 5,236,846, 5,223,440, 
5,185,270). 

The correlation between the presence of fetal fibronectin in these 

30 secretions and the labor and delivery before 34 weeks is not perfect; 

I 
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there are significant false-ne:gative and false-positive rates. 
Consequently, to address th^ need for methods to assess the likelihood of 
labor and delivery before 341 weeks and to improve the predictability of 

i 

the available tests, the meth|ods herein have been applied to development 
5 of a decision-support system that assesses the likelihood of certain 

pregnancy related events. In particular, neural nets for predicting delivery 
before (and after) 34 weekslof gestation have been developed. Neural 
networks and other decision-support systems developed as described 
herein can improve the performance of the fetal fibronectin (fFN) test by 
□ 10 lowering the number of false positives. The results, which are shown in 

ly EXAMPLE 13, demonstrate that use of the methods herein can improve 

W 

m 



n 



the diagnostic utility of existing tests by improving predictive 



:'P performance. EXEMPLARY: neural networks and implementing software 



□ (Appendix III) are also provided herein. 

1 5 PreTerm Delivery Risk Assessment Software 

||f The Pre-term Delivery Risk Assessment Software (designated 

ry ptdinp.exe in Appendix III) program provides a means to input patient 

historical information and fFN test results into a database of fixed length 

ASCII records, and to perform the calculations necessary to generate 

t 

20 inputs to three neural network tests used to evaluate the patients risks 
related to pre-term delivery.; The software generates outputs that define 
the risk of preterm delivery. j The Preterm Delivery Risk Assessment 
Software provided herein classifies the fFN ELISA positive results into 3 
clinically distinct groups. In so doing, more than 50% of the fFN ELISA 

25 false positive results can be| immediately identified. Moreover, about 35% 
of the true positive results can be rescued. The combination of the 
Preterm Delivery Risk Assessment Software with the ELISA test result 
provides new information which the clinician can use to improve the 
management of symptomatic patients. In particular, risk of delivery less 
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than or equal to 34 week, 6|days, risk of delivery less than or equal to 7 
days from time of sampling for fFN, and risk of delivery less than or equal 
to 14 days from time of sarrlpling for fFN. The exemplified software uses 
neural networks designated EGA6, EGA7f and EGA14f (see Examples) 



herein, but can be used with any nets provided herein or developed based 
on the methods provided herein. The source code for the software is set 

forth in Appendix III. j 

I 

The following is a description of the operation, inputs and outputs 
of the software. ■ 
A. User Interface j 

A typical user interfaqe is depicted in FIGURES 12-15 and 
exemplary printed outputs a're depicted in FIGURES 16A and 16B. 



The main menu, tool i3ar and results display appear as shown in 



follows: I 

File: The name of the fdb file opened by the user. 

Current Record: The record number counting from T that is 
currently displayed. i 

Number of records: The count of records contained in the open 



Patient Name: The first, middle initial, and last name of the patient 



Main Menu 



FIGURE 12. The various fields in the main window are calculated as 



file. 



Lab ID #: The contents of the associated field in the fixed length 
data record entered by the user. 



from the fixed length data 



record. 
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Pre-term Delivery Risk 34 weeks 6 days, which is the consensus 
score from the ega6 set of neural networks. 

Pre-term Delivery Risk < 7 days: The consensus score from the 
egad7f set of neural networks. 

Pre-term Delivery Risk < 14 days: The consensus score from the 
egad14f set of neural netwo|rks. 

The File item on the main menu contains the following sub menu 
items and functions; | 

Open: Open a fixed length database file (.fdb) for entry and 
examination. ! 

i 

Print: Print the current record in one of the two formats as 

I 

specified in the Options menu. 

Print Setup: Providesj standard Windows support for print functions 
setup. I 

Print Preview: Provldjes standard Windows support for print 
viewing. 

MRU List: Provides al list of the four Most Recently Used files. 
Exit: To Exit the program. 

The Record item on tl;ie main menu will contain the following sub 
menu items and functions: j 

First Record: Display ]the first record in the database file. 

Next Record: Display ithe next record in the database file. 

Previous Record: Display the previous record in the database^file. 

Last Record: Display the last record in the database file. 

Go to Record: Opens a dialog to go to a specific record number or 
search for a specific Lab ID;#. 

Edit Record: Opens ajdialog to allow the examination and 
modification of data in the displayed 
record. 
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New Record: Creates a new record at the end of the database and 
automatically edits the record. 

The Options item on ijhe main menu will contain the following sub 
menu items and functions; | 

Print full form: When ,checked, the print function will print the full 

record as shown in the edit' 

i 

record dialog box. When unchecked, the print function will print the 
information shown in the main window. The default setting is 
unchecked. | 

Clear sub fields: When checked, sub fields will be cleared when 

I 

field is unchecked in the edit dialog. The default setting is checked. 

The View item on the main menu will contain the following sub 
menu items and functions: I 

Toolbar: Standard Vyindows Toolbar appears when checked. 

! 

Status Bar: Standard Windows Status Bar appears when checked. 

I 

The Help item on the main menu will contain the following sub menu 
items and functions; i 

About PTDinp: Provide version number, program icon, and 

i 

developer of the program, j 

Tool Bar buttons will be provided for the following functions: 
File Open 

View First Record j 
View Previous RecorcJ 
View Next Record j 
View Last Record I 
Edit Record 

I 

New Record ' 
Go To Record | 
Help About , 



^Aut Available Cppy 

I 

I 
I 

Edit Dialog 

An exemplary Edit Record dialog box is set forth in FIGURE 13. 
Through this dialog the user can exam, change or input patient specific 

data into a fixed length database record. The table below provides the 

I 

size and location of each item in the fixed length database record. For 

i 

entry into the dialog box relevant items are checked; all checked items are 
assigned a value of 1, all others are assigned a value of 0. The 
alphanumeric fields in the dialog box, such as Lab ID #, name, date of 
birth, EGA boxes, G (gravit\|), P (parity), A (abortions) are assigned the 
entered values. The table set forth (True = checked, false = unchecked) 
below summarizes how the jinformation entered into the dialog box is 
converted for storage in the fixed length database record. 



NAME 


POSITION 


WIDTH 


DESCRIPTION 


LAB ID # 1 


1 


12 


ACSII text 


LAST NAME j 


13 


24 


ACSII text 


FIRST NAME 


37 


24 


ACSII text 


MIDDLE INITIAL 


61 


2 


ACSII text 


DATE OF BIRTH 


93 


10 


ACSII mm/dd/yy 


ETHNIC ORIGIN WHITE 


103 


2 


0 = FALSE 1 =TRUE 


ETHNIC ORIGIN BLACK | 


105 


2 


0 = FALSE 1 =TRUE 


1 

ETHNIC ORIGIN ASIAN 


107 


2 


0 = FALSE 1 =TRUE 


ETHNIC ORIGIN HISPANIC 


109 


2 


0 = FALSE 1 =TRUE 


ETHNIC ORIGIN NATIVE 
AMERICAN 1 


111 


2 


0 = FALSE 1 =TRUE 


ETHNIC ORIGIN OTHER 


113 


2 


0 = FALSE 1 =TRUE 


MARITAL STATUS 

1 


115 


2 


1 = Single (only one box can be 

checked) 








2 = Married 


1 
1 






3 = Divorced 
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NAME 


POSITION 


WIDTH 


DESCRIPTION 








4 = Widowed 








5 = Living with partner 








6 = Other 




117 


o 


O = Nn 1 — Yp<% 


Vaginal Bleeding ^ 


119 


2 


0 = N/A {check if sub field 








1 = Trace 2 = Medium 3 = Gross 


Uterine Contractions 


121 


2 


0 = FALSE 1 =TRUE 


Intermittent lower abdorninal pain, ' 
dull low back pain | 


123 


2 


0 = FALSE 1 =TRUE 


1 

Bleeding during the second or | 

thirH trimp<itpr 

LI III U LI II 1 I^OLwi 1 


125 


2 


0 = FALSE 1 -TRUE 


Menstrual-like or intestinal 


127 


2 


0 = FALSE 1 =TRUE 


1 

Change in vaginal discharge | 


129 


2 


0 = FALSE 1 =TRUE 


Patient is not "feeling right" | 


131 


2 


0 = FALSE 1 =TRUE 


Number/hr. ' 


133 


2 


0 = Uterine Contractions FALSE 








1 = " < 1 " 2 = " 1 -3" 3 = "4-6" 


1 

boA Dy ovJlNU 


1 '31^ 
1 oD 


Q 
O 


Mv^oii weeKs.oays lormat 


EGA by LMP ; 


143 


8 


ACSII weeks. days format 


EGA at Sampling j 


151 


8 


ACS!! weeks. days format 


GRAVITY (G:) 


159 


2 


ASCII number 


PARITY IP'\ 

1 /Am 1 1 \r . / 1 


161 


2 


A^PII niimhpr 


ABORTIONS (A:) 


163 


2 


ASCII number 


Number of Preterm delivery 

i 


165 


2 


0 = NONE 1 ="1" 2 = "2" 
3 = ">2" 


No previous pregnancies 


167 


2 


1 =" Gravity = 0" 


Previous pregnancies with no 
complications ; 


169 


2 


0 = FALSE 1 =TRUE 


History of Preterm delivery ] 


171 


2 


0- FALSE 1 =TRUE 
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WIDTH 


DF^rRIPTION 


i 

History of preterm PROM | 


173 


2 


0 = FALSE 1 =TRUE 


History of incompetent cervix 


175 


2 


0 = FALSE 1 =TRUE 


History of PIH/preeclampsia 


177 


2 


0 = FALSE 1 =TRUE 


History of SAB prior to 20 weeks : 


179 


2 


0 = FALSE 1 =TRUE 


Multiple Gestation | 


181 


2 


0 = NONE (unchecked) 








1 = "Twins'* 2 = "Triplets" 

O — LjtUaQS 


Uterine or cervical abnormality | 


183 


2 


0 = FALSE 1 =TRUE 




1 85 


2 


0 = FALSE 1 =TRUE 


tnAc^tatinnal Qiahstes ! 


187 


2 


0 = FALSE 1 =TRUE 


ny |Jd LCI lai vc k/iouiudo 




2 


0 = FALSE 1 =TRUE 


1 


1 91 


2 


O = I Jnlf Or Nnnp phppkpH 


I 

; 
1 






1 —"^l" 9_"1" Q_"1 9" j< n Qn 
1 ~ ^ 1 Z ^ 1 O ^ 1 ^ ^ ^ 

5 = "2-3" 6 = "3" 7 = ">3" 


Cervical Consistency 1 


1 0*3 


o 
Z 


blank = (unchecked 








1 rirm z — iVlOa o — ooii 


Antibiotics 


1 OR 




r\ — PAI QC 1 — TPIIP 
KJ — rMLOC 1 — 1 rfUC 


Corticosteroids 


1 Q1 






1 ocoiyiis 1 




9 


O — PAI QP 1 •>TR1IP 

\j — pMLoc 1 — 1 nut. 


1 ne t ilin 

insuiin 




9 


n — FAI 1 — TRIJF 


Mniiny perLensives ^ 




o 

Z- 


O — FAI 1 — TRIIF 


Medicationi None 




9 


n — PAIQP 1— TRIIF 


t 

Medication: Unknown 1 


207 


2 


0 = FALSE 1 =TRUE 


i 

Qualitative fFN Result \ 


209 


2 


0 = FALSE 1 = TRUE 


<34.6 Net Output Positive 


211 


20 


ASCII coded float 


<34.6 Net Output Negative 


231 


20 


ASCII coded float 


< 7 Day Net Output Positive 


251 


20 


ASCII coded float 


<7 Day Net Output Negative ! 


271 


20 


ASCII coded float 
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NAME 1 


POSITION 


WIDTH 


DESCRIPTION 


< 1 4 Day Net Output Positive 


291 


20 


ASCII coded float 


< 1 4 Day Net Output Negative 


311 


20 


ASCII coded float 



Go To Dialog j 

The Go To dialog boxj is shown in FIGURE 14. The user may enter 
either the record number orlthe Lab ID number. When OK is pressed the 
record is found and displayed based on the information contained in a 
database record. ^ 

Help About Dialog 

The Help About dialog box, which can provide information, such as 

the title of the software, version and copyright information, is shown in 

I 

FIGURE 15. I 

B. Pre-term Deli very Risk Evaluation 
1 . Loading the Networks 

When a new database is opened or the program is first run, the 

i 

neural networks associated jwith the risk evaluations are loaded. For each 
risk evaluation there are 8 neural networks that must be loaded. This is 
performed by repeated calls' to the LoadNet function of the ThinksPro 
TKSDLL.DLL (a WINDOWS"'|dynamic link library). Other suitable 
programs can be used to run the neural networks described herein. The 
LoadNet function automatically loads the weights associated with each 
network. I 

For the < 34 weeks, 16 days evaluation the following nets 

(described in the Examples) iare loaded. 

I 

Ega6_0 \ 

Ega6_1 

Ega6_2 

Ega6__3 \ 

Ega6_4 \ 

I 
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Ega6_5 | 

Ega6_6 ; 

Ega6_7. ! 
For the ^ 7 days evaluation^ the following nets are loaded: 

Egad7f0 ! 

Egad7f1 | 

Egad7f2 , 

I 

Egad7f3 [ 
Egad7f4 | 
Egad7f5 j 
Egad7f6 ' 



Egad7f7 



For the ^14 days evaluation the following nets are loaded: 
Egad14f0 ' 



Egad14f1 I 
Egad14f2 | 
Egad14f3 

Egad 1 4f 4 I 
Egad14f5 | 
Egad14f6 

Egad14f7 \ 

2. Processing the Inputs and Outputs 

To run the evaluation; of the pre-term delivery risks, data from the 
database record must be processed for use by the neural networks. The 
networks are run for a given evaluation when the "calculate risk" button 
is pressed in the edit record dialog (FIGURE 13). The positive outputs 
(described below) of each network are averaged together to produce the 
value that is displayed, printed and placed in the database. The negative 
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outputs (described below) are averaged and the result is placed in the 

i 

database only. | 

a. For the ^ 34 weeks, 6 days (referred to herein 
asl34.6) evaluation 

The < 34.6 networks! ^ ^ inputs generated from the database 

record. These inputs are calculated as follows. 

1 . Etl^nic Origin White: 1 .0 input if TRUE, 0.0 
input if FALSE. 

2. Marital Status Living with Partner: 1.0 input if 
TRUE, 0.0 input if FALSE. 

3. EGA by SONO: Convert from weeks. days to 
weeks. 

4. Vail = EGA by LMP: Convert from weeks. days to 
weeks. yal2 = EGA by SONO: Convert from 
weeks. days to weeks. If Val2 < = 13.0 then input is 
Val2; Elsie if the difference between Vail and Val2 is 
> 2 then input is Vail . Else input is Val2. 

5. EGA at Sample: Convert from weeks. days to 
weeks. 

6. If Dilatation none then input is 0.0. 

If Dilatation < 1 then input is 0.0. 

i 

If Dilatation 1 then input is 1 .0. 

i 

If Dilatation 1-2 then input is 1.5. 

If Dilatation 2 then input is 2.0. 

i 

If Dilatation 2-3 then input is 2.0. 

j 

If Dilatation 3 then input is 3.0. 
If Dilatation > 3 then input is 3.0. 

7. If Number of Preterm Delivery = 0 then input is 

o.p. 

! 
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If Number of Preterm Delivery = 1 then Input is 

i 

1.0. 

i 

I If Number of Preterm Delivery = 2 then 
input is 2.0. 

If Number of Preterm Delivery > 2 then input is 
3.6. 

8. Vaginal Bleeding: 1.0 input if TRUE, 0.0 input if 

F/^LSE. 

! 

9. If jCervical Consistency unchecked then input is 
1.823197. 

If pervical Consistency Firm then input is 1 .0. 
If (Cervical Consistency Mod then input is 2.0. 
If pervical Consistency Soft then input is 3.0. 

10. Previous pregnancies with no complications: 1.0 

i 

input if TRUE, 0.0 input if FALSE. 

1 1 . FFN Result: 1 .0 input if Positive, 0.0 input if 

1 

ne|gative. 
b. Fo'r the ^ 7 days evaluation 
The <7 day networks use 7 inputs generated from the database 
record. These inputs are cajlculated as follows. 

1 . Ethnic Origin White: 1 .0 input if TRUE, 0.0 input if 
FALSE. 

2. Uterine Contractions: 1 ,0 input if TRUE, 0.0 input if 
FALSE. I 

3. Number of Abortions: Convert to float. 

4. Vaginal Bleeding: 1 .0 input if TRUE, 0.0 input if 
FALSE. I 

5. If Number/hr unchecked then input 0.0. 

If Number/hr < 1 then input 1 .0. 

j 
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If l]slumber/hr 1-3 then input 2.0. 

1 

If Number/hr 4-6 then input 3.0. 

I 

If Number/hr 7-9 then input 4.0. 

i 

If Number/hr 10-12 then input 5.0. 
If Number/hr >12 then input 6.0. 

6. Nq previous pregnancies: 1.0 input if TRUE, 

0. 0 input if FALSE. 

7. f FN Result: 1 .0 input if Positive, 0.0 input if 
negative. 

c. For the ^14 days evaluation 

The ^14 day networks use 7 inputs generated from the 
database record. These inputs are calculated as follows. 

1 . Ethnic^ Origin Native American: 1 .0 input if TRUE, 
0.0 input if FALSE. 

2. Marital Status Living with Partner: 1 .0 input If 

TRUE, o!o input if FALSE. 

I 

3. Uterin|e Contractions: 1.0 input if TRUE, 0.0 input if 
FALSE. I 

4. If Dilatation none then input is 0.0. 
If Dilatation < 1 then input is 0.0. 
If iDilatation 1 then Input is 1 .0. 

If loiiatation 1-2 then Input is 1.5. 
If joilatation 2 then input Is 2.0. 
If 'Dilatation 2-3 then input Is 2.0. 
If iDllatation 3 then input is 3.0. 

If Dilatation > 3 then input is 3.0. 

I 

5. If ^Number/hr unchecked then input 0.0. 
If Number/hr <1 then input 1.0. 

If iNumber/hr 1-3 then input 2.0. 

i 
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If lilumber/hr 4-6 then input 3.0. 
If rsiumber/hr 7-9 then input 4.0. 
If l|slumber/hr 10-12 then input 5.0. 
If Number/hr > 1 2 then input 6.0. 

6. Nq previous pregnancies: 1 .0 input if TRUE, 0.0 
input if FALSE. 

7. FFN Result: 1 .0 input if Positive, 0.0 input if 
negative. 

3. Print Functions | and Output interpretation 
Based on the print full form option (options menu), print the full 
form if the option is checkecl and the results only if the option is not 
checked. FIGURES 16A and 16B show exemplary output formats, with 
the risk indices for each net', which are interpreted according to the 

i 

following tables: 

Risk of Preterm Delivery (Delivery before 34 weeks 6 days gestation) 



Risk Index 


' Interpretation 


^ .30 


1 low risk 


> .30 


! high risk 



I 



Risk of Delivery within 14 days of sampling for fFN Qua! ELISA. 



1 

Risk Index | 


Interpretation 


< 0.10 i 


low risk 


0.10-0.40 


moderate risk 


> 0.40 ' 

1 


high risk 



I 



1 

I 

i 

i 
i 
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Risk of Delivery within 7 days of sampling for fFN Qual ELISA. 



Risk Index 


Interpretation 


< 0.05 


low risk 


0.05 - 0.60 


moderate risk 


> 0.60 


high risk 



D. Software Performance 

I 

As demonstrated beloiw, the Preterm Delivery Risk Assessment 
Software supplements the fl~N ELISA results in a clinically useful manner. 
By combining patient history and symptom information with the fFN 
ELISA test results, the softvyare is able to more accurately assess the risk 
of preterm delivery. The data presented above suggest that the software 

is more capable of discriminating those women truly at risk for preterm 

I 

delivery: whereas the fFN ELISA test has relatively high false positive 
rates and low positive predictive value, the software test reduces false 
positive observations by over 50% and doubles predictive value of the 
positive result. The fFN ELISA test allowed clinicians to identify those 
patients not at risk for preterm delivery. Given the significant increase in 
relative risk and the risk classification of the software test, the clinician 
may now identify those worjnen who are at risk for preterm delivery. This 
capability represents a new advance in the clinical management of 
women who are experiencing symptoms of preterm labor. 

In particular, the performance of the Preterm Delivery Risk 

I 

Assessment Software has b|een evaluated on 763 women experiencing at 

least one of the following symptoms of preterm labor: 

i 

1. Uterine contractions, with or without pain. 

2. Intermittent lower abdominal pain, dull, low backache, pelvic 
pressure. ' 

3- Bleeding during the second or third trimester. 



4, 
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4. Menstrual-like or intestinal cramping, with or without 
diarrhea. 

5. Change in vaginal discharge— annount, color or consistency. 

i 

6. Not "feeling right". 

All 763 women were tested for fFN using the Qualitative fFN ELISA 

test. Based solely on the ELISA test, 149 women tested positive for fFN 

I 

of which only 20 (13.4%) delivered within 7 days and 25 (16.8%) 
delivered within 14 days. ! 

The low positive predictive value of the fFN ELISA test is enhanced 
by the Preterm Delivery Risk Assessment Software, which combines the 
fFN ELISA result with other patient information. 

Table 1 compares the performance of the Qualitative fFN ELISA 
Test with the Preterm Delivery Risk Assessment Software Test for 
predicting delivery before 35| weeks completed gestation. The number of 
false positive observations decreased from 105 to 42, or about 60%. The 
decrease in false positive results is accompanied by a corresponding 
increase in true negative results, from 584 for the fFN ELISA to 647 for 
the software test. Moreover,! a reduction in false negative results was also 

observed, from 30 for the ELilSA test to 25 for the software test. 

i 

Accordingly, the sensitivity and the specificity of the ELISA test are 
augmented by the software from 59.5% to 66.2% and from 84.8% to 
90.4%, respectively. The positive predictive nearly doubles, increasing 
from 29.5% to 53.9%, and both the odds ratio and relative risk are 
increased substantially. 
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MEASURE 


G 


UAL fFN ELISA 


RISK ASSESSMENT 




TEST 


SOFTWARE TEST 


True Positive 




44 


49 


False Positive 




105 


42 


True Negative 




584 


647 


False Negative 




30 


25 


Sensitivity 




59.5% 


66.2% 


Specificity 




84.8% 


96.3% 


Pos PV 




29.5% 


53.9% 


Neg PV 




95.1% 


96.3% 


Odds Ratio 




8,2 


30.2 


Relative Risk 




6.0 


14.6 



Table 1 . Performance comparison olj Qualitative fFN ELISA Test and the 
Preterm Delivery Risk Assessment Software Test relative to risk of 
preterm delivery before 35 completed weeks of gestation. The Risk 
Assessment Software combines fFNlELISA Test results with patient 
history and symptom information to /provide a more accurately assess risk 
of preterm delivery (before 35 compfieted weeks of gestation). 

Table 2 compares the performance of the two tests relative to risk 

of preterm delivery within 7 days, [he largest difference between the two 

tests is in the reduction of false positive test results of the software when 



compared to the ELISA test 
positive observations from 129 to 
decrease in false positive results ii 
negative results, from 61 1 to 683 



The software decreased the number of false 
57, or about 56%. Accompanying this 
the matching increasing in true 
The true positive and false negative 
results remained essentially iLinchaIhged. The sensitivity and specificity of 
the software test is much higher tnan the ELISA test. Compare the 
sensitivity of 91 .3% for the software with 87.0% for the ELISA, and the 
specificity of 92.3% for the software with 92.3% for the ELISA. 
Furthermore, the software test dojbles the positive predictive value. 
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la, 

ip 

i.n 



i;3 



10 



15 



20 



25 



increasing form 13.4% to 26.9%. Finally, the odds ratio is quadrupled 
and the relative risk more than tripledf by the software. 



MEASURE 


QUAL fPN ELiSA TEST 


RISK ASSESSMENT 
SOFTWARE TEST 


True Positive 


io 


21 


False Positive 


1 1/29 


57 


True Negative 


oil 


000 


False Negative 


; ' 


2 


Sensitivity 


87.0% 


91 .3% 


Specificity 


8(2.6% 

i 13.4% 

i / 

,99.5% 


92.3% 


Pes PV 


26.9% 


Neg PV 


99.7% 


Odds Ratio 


i /3I.6 


125.8 


Relative Risk 


' / 27.4 


89.7 



Table 2. Performance comparison of Qualitative fFN ELISA Test and the 
Preterm Delivery Risk Assessment Software Test relative to risk of 
preterm delivery within 7 days./ 

Table 3 compares the pyformance of the two test relative to risk 

of preterm delivery within 14 days. Once again, the software decreases 

false positive test results when compared to the ELISA test, from 124 to 

55, or about 53%. This decjrease in false positive results is matched by 

the corresponding increase in[ true negative results, from 609 to 678. The 

number of true positive andl jalse negative results were unchanged. Whilst 

the sensitivity of the test was unaffected, the specificity of the test rose 

nearly 10 points, increasing|/from 83.1% to 92.5%. As seen before, the 

positive predictive value nearly doubled, increasing from 16.8% to 

31.3%, and the odds ratio and relative risk increased substantially from 

24.6 to 61 .6 and from 20.H to 44.7, respectively. 
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MEASURE 


QUAL fFN ELISA 
i TEST 


RISK ASSESSMENT 
SOFTWARE TEST 


True Positive 


' 25 


25 


False Positive 


124 


55 


Trijp Npaative 


1 609 


678 


False Negative 


i 5 

1 


5 


Sensitivity 


i 83.3% 

1 


83.3% 


Specificity 


1 83.1% 


92.5% 


Pos PV 


j 16.8% 


31.3% 


Neg PV 


99.2% 


99.3% 


Odds Ratio 


■ 24.6 


61.6 


Relative Risk 


; 20.7 


44.7 



I 

Table 3. Performance comparison of Qualitative fFN ELISA Test and the 
Preterm Delivery Risk Assessment Software Test relative to risk of 
preterm delivery within 14 days. 

The following examples are included for illustrative purposes only 
and are not intended to limijt the scope of the invention. 

I EXAMPLE 1 
Evaluation of Patient History Data for Relevant Variables 

This examples demonstrates selection of the candidate variables. 
Requirements | 

Evaluation of the patient history to determine which variables are 
relevant to the diagnosis. This example is performed by performing a 
sensitivity analysis on eachjof the variables to be used in the diagnosis. 

Two methods can be used to perform this analysis. The first is to train a 

I 

network on all the information and determine from the network weights, 
the influence of each input |on the network output. The second method is 
to compare the performance of two networks, one trained with the 
variable included and the second trained with the variable eliminated. 
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r 

This training would be perfotmed for each of the suspected relevant 
variables. Those that did not contribute to the performance will be 
eliminated. These operations are performed to lower the dimension of the 

inputs to the network. Wheh training with limited amounts of data, a 

i 

lower input dimension will increase the generalization capabilities of the 
network. ! 
Analysis of Data i 

The data used for this example included 510 patient histories. Each 
record contained 120 text and numeric fields. Of these fields, 45 were 
identified as being known before surgery and always containing 
information. These fields wiere used as the basic available variables for 
the analysis and training of networks. A summary of the variables used 
in this example was as folloyvs: 



1. 


1 

age (preproc)- Preprossed to 
normalize the age to fall 
between 0 and 1 j 


24. 


Uterine/Tubal 
Anomalies26. 
Ectopic Preg 


2. 


i 

Diabetes | 


25. 


Fibroids 


3. 


1 

Preg Induced DM , 


26. 


Ectopic Preg. 


4. 


1 

Hypertension j 


27. 


Dysfunctional Uterine Bid. 


5. 


Preg hyperplasia ' 


28. 


Ovarian Cyst 


6. 


1 

Autoimmune Disease. 


29. 


Polycystic Ovarian Synd 


7. 


Transplant | 


30. 


Abnormal PAP/Dysplasia 


8. 


Packs/Day ! 


31. 


Gyn Cancer 


9. 


Drug Use ! 


32. 


Other HX 


10. 


#Preg i 


33. 


Past HX of Endo 


11. 


#Birth 1 


34. 


Hist of Pelvic Surgery 


12. 


#Abort 




35. 


Medication History 


13. 


Hist of Infertility | 


36. 


Current Exog. Hormone 


14. 


i 

Ovulatory j 


37. 


Pelvic Pain 



I 
t 
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15. 


i 

Anovulatory 


38. 


Abnormal Pain 


16. 


Unknown | 


39. 


Menstrual Abnormalities 


17. 


1 

Oligoovulatory 


40. 


Dysmenorrhea 


18. 


Hormone Induced j 


41. 


Dyspaarunia 


19. 


Herpes 


42. 


Infertility 


on 


ijeniiai vvarxs 

1 


43. 


Adnneaxal 
Masses/Thickening 


21. 


Other sexually transmitted 
diseases (STD) 


44. 


Undetermined 


22. 


Vag Infections | 


45. 


Other Symptoms 


23. 


Pelvic inflammatory disease 
<PID) j 





Methodology Used 



of variables is to 
included. Using 



The most commonly used method for determining the importance 



! 

:rain a neural network on the data with all the variables 



\e trained network as the basis, a sensitivity analysis is 
performed on the Network and the training data. For each training 
example the network Is run in the forward mode (no training). The 
network outputs were recorcled. The for each input variable, the network 
is rerun with the variable replaced by it's average value over the training 
example. The difference in output values is squared and accumulated. 
This process is repeated for each training example. The resulting sums 
are then normalized! so that the sum of the normalized values equals the 

In this way, if all variables contribute equally to the 
output, their normalized value would be 1 .0. The normalized value can 
then be ranked in orper of importance. 

There are several problems with the above approach. First, it is 

dependent on the neural netjwork solution found. If different network 

i 

starting weights are used, ajdifferent ranking might be found. Secondly, 
if two variables are highly correlated, the use of either would contain 
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sufficient information. Depending on the network training run, only one 

of the variables might be identified as important. The third problem is 

I 

that an overtrained network can distort the true Importance of a variable. 

i 

To minimize the effects of the above problems, several networks 
5 were trained on the data. The training process was refined to produce 
the best possible test set performance so that the networks had learned 
the underlying relationship between the inputs and the desired outputs. 
By the end of this process, both a good set of networks would be 
M available and the training configuration for the final trained networks 

□ 10 would be established. The sensitivity analysis was performed on each of 

Ti the networks trained and the normalized values were averaged. For this 

iiM j 

W example, a training run included 1 5 networks trained on five partitions of 

j;n the available data using a holdout method. 

i!^ Once the ranking of variables has been established, test runs are 

15 made to determine the effects of eliminating variables has on the test set 
performance. Eliminating a variable that has a small contribution should 
lower the test set performance. When overtraining is an issue, due to 
limited training data, eliminating variables can actually improve the test 
set performance. To save on processing time, groups of variables can be 
20 eliminated in a test based on the ranking. 
Results 

The rankings or variables are as follows and are reported for 
networks trained in run patO|5. 
01 . 35 Medication Hist 

25 



ru 



30 



01. 


35 


Medication History 


02. 


33. 


Past fiistory of Endo 


03. 


11 


#Birth 




04. 


37. 


Pelvic Pain 




05. 


40. 


Dysmenorrhea 




06. 


34. 


Hist of Pelvic Sijrgery 


07. 


1. 


Age(preproc) 
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08. 


13. 


Hist of Infert 


i 


09. 


8. 


Packs/Day 




10. 


36. 


Current Exog. Hormones 


11. 


42. 


Infertility 


i 


12. 


18. 


Hormone Induced 


13. 


15. 


Anovulatory 




14. 


14. 


Ovulatory 


1 


15. 


43. 


Adnnexal Masses/Thickening 


16. 


45. 


Other Symptoms 


17. 


30 


Abnormal PAP/Dysplasia 


18. 


26. 


Ectopic Preg. , 




19. 


19. 


Herpes i 




20. 


39. 


Menstrual Abnormalities 


21. 


12. 


#Abort ; 




22. 


41. 


Dyspaarunia | 




23. 


24. 


Uterine/Tubal Anomalies 


24. 


31. 


Gyn Cancer 




25. 


32. 


Other history j 




26. 


10. 


#Preg 




27. 


28. 


Ovarian Cyst ! 




28. 


25. 


Fibroids | 




29. 


22. 


Vag Infections ' 




30. 


16. 


Unknown ! 




31. 


27. 


Dysfunctional Uterine Bid. 


32. 


38. 


Abdominal Pain 




33. 


5. 


Preg hyperplasija 


34. 


9. 


Drug Use 




35. 


20. 


Genital Warts : 




36. 


3. 


Preg Induced DM 


37. 


4. 


Hypertension 




38. 


21. 


Other STD 




39. 


23. 


PID ; 




40. 


44. 


Undetermined 




41. 


2. 


Diabetes 





-79- 




t Available C0py 

I 



24727-801 F 
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42. 17. Oligoovulatory - 

43. 6- Autoimmune Disease 

I 

44. 29. Polycystic Ovarian Synd 

45. 7. Transplant | 

Subsets of the variables were tested and the final set of 14 
variables to be used to train ithe pat07 networks [see. Examples 13 
and 14]. Some variables were used that were not in the above top 14. 
This occurred to improve th^ test set performance. The rankings for the 
pat07 networks are as follows: 



i:3 


10 


01. 


10. 


Past history of Endo 


p 




02. 


6. 


#Birth 


I.U 




03. 


14. 


Dysmenorrhea 1 


i,y 
i.n 




04. 


1. 


Age(preproc) | 


m 




05. 


13. 


Pelvic Pain | 


n 


15 


06. 


11. 


Hist of Pelvic S;urgery 




07. 


4. 


Packs/Day j 


i ^ 




08. 


12. 


Medication History 


P 




09. 


5. 


#Preg | 


m 




10. 


7. 


#Abort 




20 


11. 


9. 


Abnormal PAP/Dysplasia 






12. 


3. 


1 

Preg hyperplasia 

1 






13. 


8. 


Genital Warts : 






14. 


2. 


Diabetes ' 



Conclusions 

I 

25 The set of variables identified in this example appear to be 

reasonable based on the testing and information. 

I EXAMPLE 2 
Train Networks on Patient Ijlistory Data 

This example, using the above 14 variables, demonstrates methods 
30 for setting and optimizing various parameters. 
Requirements { 
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At the completion of the above examples, train a set of networks 

I 

on the reduced patient history and record their performance. Experiments 
were run to determine the best configuration and parameters for training 
of the networks. An analysis of the performance was performed to 
5 determine the number of false positive and false negatives, to see if a 
given subset of the patients ^can be reliably diagnosed. Since there was 
limited data, the estimated performance was determined by leaving out 
small portions of the database (25%) for testing and training on the 
yf' remaining data. This method was repeated until all of the data has been 

P 10 used as test data in one of the networks. The combined results on the 

jy test data then become the performance estimate. A final network was be 

trained using all of the available data as training data. 
m Methodology Use|l ^ 

□ ( When dealiiig with small training examples, the holdout method is 

15 effective In providing test Information useful In determining network 

parameter! settings. In order to maximize the data 



configuration and 
available for train 
holdout was usee 



ng without a big increase in processing time a 20% 
instead of the proposed 25%. This produced 5 
partitions of the data instead of 4 and made 80% of the data for training 
20 in each partition. 

To minimise the effecjts of the random starting weights, several 
networks were tialned in the full training runs. In theses runs three 
networks were trained in eaich of the five partitions of data, each from a 
different random start. The outputs of the networks were averaged to 
25 form a consensus result that has a lower variance than could be obtained 
from a single network. 
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! 



2. Amount of Noise addeid to the inputs. 

3. Amount of Error tolerance. 



4. 



Learning algorithm used. 



5. Amount of Weights decay used. 

6. Number of input variables used. 

A complete search of all possible combinations of 45 variables was 
not feasible because of the amount of CPU time needed for the tests. 
Test networks were trained with parameters chosen from based on 
parameters known by those |of skill in the art to be important in this area 
and based on results of prior tests. Other sets of variables would also 
have been suitable. Also, as shown elsewhere herein, combinations of all 
of the selected 14 variables have been tested. Once the best 
configuration was determined, a final set of networks was trained on the 
complete data set of 510 patients. In the final set of networks, a 
consensus of eight networks was made to produce the final statistics. 
Results ! 

The final holdout training run was pat06 with 14 variables. The 
performance on the test data was 68.23%. The full training run was 
pat07 with the same network configuration as pat06. The performance 
on the training data was 72.j9%. Statistics were generated on the last 
training run based on the us0 of a cutoff in the network output values. If 
the network output was below the cutoff, the example was rejected from 
consideration. The following table is a summary of the results for the 
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consensus of eight networks 



in 



was produced to demonstrate this final training 



the pat07. A test program named adzcrf 



Cutoff 


0.00 


0.02 




0.075 


0.075 


0.10 


Sensi- 
tivity 


.828179 


.83582 


1 


.872247 


.943750 


.956522 


Spec- 
ificity 


.598174 


.62766 




.630137 


.638298 


.745098 


PPV 


.732523 


.76190 


• 

5 


.785714 


.816216 


.871287 


NPV 


.723757 


.72839 


5 


.760331 


.869565 


.904762 


Odds 
Ratio 


2.695 


3.000 




3.494 


4.907 


7.412 


% Re- 
jected 


0 


10.58 




26.86 


50.20 


71.96 



PPV = positive predictive value;, NPV = negative predictive value 

EXAM^>LE 3 

Preprocessing and input Western Bl9t Data 
Requirements 

The antigen data, from Western Blots, for the patients that was 
originally delivered to Logical Desians provided information on only the 
peak molecular weights and their /associated intensities. Analysis of this 
data and of the original images frpm which the data was taken, suggests 
that it may be possible to use the original image digitized in a way that , 
could provide more information to the neural network. In examining the 
original images for two experiments, it preprocessing of the image data 
decreases the variability of the Dosition of a specific molecular weight in 
the image. This preprocessing wvill use a polynomial fit through the 
standards image to produce a miodified image. Preprocessing of the 
images will also include steps to normalize the background level and 
contrast of the images. I 

Once the preprocessing is complete, the image data could be used 
as is, or the peak molecular weights could be extracted. From resulting 
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Ml 



images, inputs to the neural network will be generated. As a typical 

image is abo':iT^1000 pixels Ibng, methods to reduce the number of inputs 

i 

will be investigated. As the image would be coded directly into the 
network inputs using the full 'or reduced dimension (resolution) images, a 
neural network will be trained with supervised learning to aid in the 
determination of the ranges of molecular weights that are related to the 
determination of the disease.| This Example focuses on using the image 
as a whole in the input to the network. 



P Methodology Used 



10 Using a correlation technique, similar features on images of 



l.y Western blots were matched and a correlation plot was produced. From 

m I 

Ifi those plots, it was concluded that there was too much variation in 

I' matches on the correlation plots of two samples to accurately align the 

O samples. Since each input of the network needed to accurately represent 

u i 

U 15 a molecular weight value, it was decided that only information from the 



!i standards image would be used for alignment of images. 

U A quadratic fit was perjformed on the standards image to generate 

a means to translate relative mobility information to molecular weight. 
After plotting a curve of relative mobility to the log of the molecular 
20 weight and examining the RS,QR values, it was concluded that the 

quadratic fit was not accurate enough for performing this translation. The 
calculated weight for a standard molecule varied from gel to gel using the 
quadratic fit. | 

Several methods were tried to improve the translation of relative 
25 mobility to molecular weight. | Cubic spline interpolation was selected. 
This method guarantees smooth transitions at the data points and Is 
rapidly calculated. The only concern is how the method performs on 
values of relative mobility outside the intervals covered by the standards. 
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If termination conditions are set properly, the extrapolation problem 
seems to be avoided. This was the selected method. 

Using spline interpolation, the images were converted to fixed 
dimension training records. At this point, image intensity normalization 
had to be considered. Two alternatives were considered. The first was 
to perform no normalization. The second was to process the images so 
that the maximum value across the image was set to 1 .0 and the 
minimum value was set to 0.0. Networks were trained on each 
alternative and the results were compared. With no noise added to the 
inputs, the preprocessed image network had a training example 
performance of 97% while the performance for no preprocessing was 
79%. When noise was added, the two alternative gave similar results. 
The choice was made to use the preprocessed images for further training 
runs. This choice insured that a given network input would consistently 
be associated with a specific molecular weight within the tolerances 
achievable using the Western Blot method. 

Using the above choices, a series of eight neural networks were 
trained to provide information on the importance of different molecular 
weights on the prediction of the Endo present variable. In order to permit 
an analysis of the direction of correlation, only a single hidden processing 
element was used in the training. A sensitivity analysis was performed 
on each of the networks and the resulting consensus was plotted using 
Excel. 

The weights of the tjietwork were then averaged together to 
generate a consensus value for each weight. Since the interconnection 
weight from the hidden clement to the output could be either positive or 
negative. The weights were transformed so that all the output 
connections had the same sign. The weights were then averaged and the 
results plotted using Excel. 
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Results 

The analysis of the Western Blot data was performed using a cubic 
spline interpolation for image alignment to the network inputs and 
Max/Min image preprocessing. Given that a certain amount of variability 
can be expected in the accuracy of alignment of the images, due to the 
Western Blot methodology, this approach is believed to give better results 
that the polynomial fit originally used. 

The plot of the sensitix^ity analysis and of the weights for the final 
onsensus networks indicated that there are regions on the Western Blot 
that can aid in the prediction and diagnosis of the disease. The width of 
the regions of positive and m^gative correlation, as seen in the network 
weights, also indicates that the results shown are significant. If the 
peaks had been very narrow] one would have to conclude that the peaks 
were artifacts of the trainingl process, similar to overtraining, and not form 
15 the underlying process beincj learned. The regions that appear important 
are as follows: 

Positive Correlation: 

31503.98 - 34452.12 
62548.87 - 65735.97 
20 84279.36 - 8]&458.49 

Negative Correlatior 

19165.9 - 20142.47 
50263.36 - 93352.14 
67725.77 - 7j8614.77 
25 While there were a number of positive and negative peaks, these 

appeared to be the most likely regions for inclusion into two ELISA tests. 
One test will focus on the positive regions and the other on the negative 
regions. The two resulting values can then be combined with the patient 
history data as input to the neural networks. 
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Conclusions 

The neural network was able to find regions on the Western Blot 
/ |\that correlate with the presence of the disease. 

EXAMPLE 4 

Investigate Fixed Input Dimension for Western Blot Data 
Requirements 

Using peak molecular weights extracted from the preprocessed 
image, methods to reduce the varying dimension of the western blot data 
for a patient to a fixed dimensiori for the neural network will be 

10 investigated. This approach is desirable in that it will have substantially 
fewer network inputs than the full image approach. The basic problem is 
that the test yields a varying number of molecular weights that might by 
interrelated. Comparison of results from Example and this example will 
indicate that patterns of molecular weights exist or if the weights are 

15 unrelated. Since there is some variability in the weight data the approach 
to the processing of this data will be similar to a fuzzy membership 
function, even though the classification will be performed with a neural 
network. / 
Additional Requirements 

20 Fractions are identified from the Western blot data. Since 

production of these fractions is reproducible, the effectiveness of the use 
of this information will be determined by processing the Western blot 
image data into bins corresponding to the molecular weights of the 
fractions. 

25 Methodology Used 

From the results of Example 5, several ranges in molecular weight 
were determined to correlate with the disease. A reduced input 
representation was produced by using a gaussian region centered on each 
of the peaks found in Example 5. The standard deviation of the gaussian 
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was determined so that the value of the gaussian was below 0.5 at the 
edges of the region. The basic operation performed to generate the 
neural network input was to perform a convolution between the gaussian 
and the Western blot image. The calculations were all performed using 
5 the log of the molecular weight. 

A separate software program was produced. The program 
performed the convolution on the normalized images with respect to 
molecular weight and intensity. The parameters for calculation of the 
network inputs are contained in a table in the binproc program. In 
10 binproc the mean and std. deviation are stored in the table. The program 



H is recompiled when the table values are changed. The program has a test 

ijij mode that produces an output file that allows the gaussians to be plotted 

m 



an compared to the Western Blot images, using Excel. Plots of regions 
are included in the documentation. 
15 When working with 36 fractions, binproc. c was again modified to 

translate the positions of the fractions into table values for binproc. This 
iP modified program is called fproc.d. It's purpose is to perform the spline 

interpolation required to normalize the molecular weight value based on 
the standards. Binproc2.c was produced from binproc, replacing the 
20 mean and std. deviation tables with min. and max. tables which 
correspond to the endpoints of the fractions in the files supplied. 

In order to test any of the data files produced by the above 
programs, the holdout methods;^ was used with 80% of the data being 
used for training and the remai ling 20% to be used for testing. Once the 
25 training data are produced form the Western Blot data, a random number 




column and the Patient ID col 



umn 



The data was then sorted on tie random number column. This in effect 



shuffles the data. In this way. 



was added in the Excel spreadsheet. 



it is likely that each partition has examples 



from each of the gels. With th^se percentages, five separate training and 
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test files were produced o)s as to allow a network performance to be 



iL( 



estinnated from the combitjied test set results. 

Using ThinksPro™ the number of inputs used by the network could 
be varied by excluding inputs. Excluded inputs are not presented to the 
network during training. Using the sensitivity analysis as a guide, 
unimportant inputs are eliminated. Decreasing the dimension of the input 
space becomes even more important when the number of training 
examples is small. This method is the same as was used for eliminating 
variables in the patient history training runs. At the current time, this 
process is performed manually. 
Results 

In Example 5, networks trained on all the data were used to 
determine what ranges of molecular weights were important to the 
classification process. In this Example, the holdout method was used to 
train networks so that the test set performance could be estimated. The 
first set of test were based on regions identified in Example 5. The 
second set of tests were made using the fractions identified in the four 
ishgel files. 

The initial consensus runs based on the top six regions found in 
Example 5 yielded poor performance (50%). Analysis of the input data 
generated, indicated that the regions used to generate the input data 
were too narrow to capture the important information from the image 
data. The regions were widened and the top ten regions from Example 5 
were included Instead of the top six. A test on the ten wider regions 
Indicated slightly better performance. Using sensitivity analysis, three of 
the ten regions were eliminated and a complete test as run. The 
performance on the six of the ten wider regions improved to 54.5%. 

As the number of inputs to the network was further decreased, the 
test set performance (estimated with the holdout method) continued to 
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increase. The best performance was achieved from using only one of the 
regions, with molecular weight ranging from 66392,65 to 78614,74, The 
performance estimate was 58.5% on test data using the holdout method. 

This process was applied again using, as a start, 36 regions based 
on identified fractions. There was a great deal of overlap in the 36 
fractions. The top 7 fractions were determined from the 36 using 
sensitivity analysis. Similar performance of 58% was achieved using the 
subset of the fractions. 
Conclusions 

None of the tests yielded results that were very high. The primary 
reason for this is likely to be the limited amount of training data available 
for this Example. Results from previous Examples indicated that as the 
number of patients in the training sample decreased, that the performance 
on validation data also decreased. This relationship is illustrated in the 



following table. 



Network 


Patients 


Estimated Performance 


Patient History 


510 


68.23 (pat06) 


Elisa using patient 
history data only 


350 


62,76 


Western Blot 


200 


58.0 



Better results were achieved on the ELISA/patient historical data when 
including the Elisa variables, even with the reduced number of patients. 
This showed the value of the ELISA variable. 

It appears that a number of regions can be determined as being 
important to the classification of the disease. Substantially different sets 
of regions has yielded similar results, indicating that there may be 
patterns in the Western Blot data that indicate the presence of the 
disease. With a small database of patients, it is more difficult to isolate 
these patterns. 
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It is clear that increasing the size of the database for Western Blot 
data will improve the performance of the networks trained on this data. 
When combining Western Blot data with patient historical data, the input 
dimension of the network will increase. The increase in input dimension 
usually requires more training examples to ensure generalization. 

EXAMPLE 5 
Train Networks Using Western Blot Data 

The purpose of this example was to train a set of networks to 
determine the performance estimate for the diagnosis using only western 
blot data. Experiments were run to determine the best configuration and 
parameters for training of the networks. The method described in 
Example 2 above was be used for this performance estimate. A final 
network was trained using all of the available data as training data. The 
output of this trained network (antigen index) was used as an input to the 
network generated in the combined data phase. 
Methodology Used 

Several methods were used to find the best performing set of 
inputs for the available training data. From previous Examples, the use of 
the sensitivity analysis was shown to produce good results in identifying 
the importance of each of the inputs variables. The number of networks 
were trained on combinations of variables manually determined from the 
sensitivity analysis. 

In preparing an automated procedure, the use of a 2x2 contingency 
table Chi square analysis of the variables was used to provide an 
alternative ranking of the importance of the variables. Since the inputs 
were continuous, a threshold was used for each input to generate the 
information needed for the contingency table. The Chi-square value 
varies depending on the setting of the threshold. The threshold value 
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used in the ranking of variables was chosen to maximize the Chi square 
statistic. 

The training runs made during the development of the automated 
procedure were chosen from these rankings. At the time that the training 
5 runs were made, an automated procedure had not been formulated. To 
save on overall processing time, only one partition of the training data 
was used. Combinations of variables that performed well in the first 
partition of the training and test data were then tried on remaining the 
j'2 partitions. 

^9 10 One method suggested in the literature for finding the best set of 

LU inputs has been to use a genetic algorithm to determine the highest 

\^ performing set of inputs. Genetic algorithms typically require thousands 



m 



of iterations to converge to a good solution. In working with the Western 



p Blot data, this would represent a large amount of computer time, even 

U 15 with the small training example size. For 10 variables, an enumeration of 



all combinations would require 1024 training runs. An alternative to the 
genetic algorithm was attempted. In this alternative method, a neural 
network was trained to predict the test set RMS Error based on the set of 
inputs chosen. The training examples used for this experiment were the 

20 results of training runs on the first partition of the Western Blot data. The 
predictor network was then tested with all combinations to determine the 
predicted minimum combination. That input combination was then used 
to train a network on the Western Blot data. The main drawback of this 
method and the genetic algorithm approach is that the sensitivity analysis 

25 information, found to be very effective is ignored in the process. 
Results 

The basic rankings for the 10 variables (bins) in the Western Blot 
Data are based on a consensus of 8 networks trained on the full database 
of 200 examples. The results are as follows: 
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i:3 
..p 

i.n 
m 

□ 

U 

m 
m 



10 



15 



20 



25 



7 1.182073 

9 1.055611 

3 1.053245 

8 1 .039028 
6 1 .027239 

10 : 1.023135 

4 : 0.978769 

5 : 0.952821 
2 : 0.899936 
1 0.788143 

The rankings of the 10 variables based on the Chi-square analysis 
are as follows: 



3 : 


4.380517 


9 


3.751625 


7 


3.372731 


2 : 


3.058437 


6 : 


3.022164 


5 : 


2.787982 


10 


1.614931 


4 


1.225725 


1 : 


0.975502 


8 


0.711958 



During the analysis of the Western Blot data, a number of networks 
were trained on the first partition(s) of the training data. The test results, 
which are ranked below, show the variables that were included in the 
training run. 
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Variables 



1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Test Error 




0 


0 


0 


1 


0 


1 


0 


0 


1 


1 


0.49291 




0 


0 


1 


0 


0 


0 


1 


0 


1 




0.49374 




0 


0 


0 


0 


0 


1 


0 


0 


1 


1 


0.49831 




1 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0,50036 


(11) 


0 


0 


1 


0 


0 


0 


0 


0 


0 




0.50145 




0 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0.50164 


(13) 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0.50174 


(5) 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0.50182 




0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0.50295 


(8) 


0 


0 


1 


0 


1 


0 


1 


0 


0 


0 


0.50285 




0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0.50323 




0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0.50360 


(7) 


0 


0 


1 


0 


0 


0 


1 


0 


0 


1 


0.50587 




0 


1 


1 


0 


0 


0 


0 


1 


0 


0 


0.50682 


(3) 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0.50707 




0 


0 


0 


1 


1 


1 


0 


0 


1 


1 


0.50823 




1 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0.50853 


(2) 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0.50995 


(1) 


1 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0.51158 


(9) 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0.51163 




0 


0 


0 


0 


1 


1 


0 


0 


0 


0 


0.51372 


(10) 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0.51909 




1 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0.52084 


(4) 


1 


1 


0 


0 


1 


0 


1 


0 


0 


1 


0.52950 




0 


1 


0 


0 


0 


0 


1 


0 


1 


0 


0.52978 




0 


0 


0 


1 


0 


1 


0 


1 


0 


0 


0.53196 




0 


0 


1 


1 


0 


0 


0 


1 


1 


0 


0.54841 




0 


0 


1 


1 


0 


0 


1 


0 


1 


0 


0.54934 


(14) 



10 



15 



20 



25 



30 
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1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Test Error 




0 


1 


0 


1 


0 


0 


0 


0 


0 


1 


0.55178 


(12) 


1 


1 


0 


0 


0 


1 


1 


0 


0 


0 


0.55290 


(6) 


1 


0 


1 


0 


0 


0 


0 


1 


0 


1 


0.56664 




0 


0 


1 


1 


0 


1 


0 


1 


1 


1 


0.59937 





0 indicates combination was generated by prediction network training 
process. 

In looking at the above test runs, it is clear that the more important 
variables in the rankings contributed to lower test set errors and that the 
more variables included, the lower the test set results. This shows the 
importance of choosing the best subset of variables in developing a high 
performance neural network. 

Several combinations of variables were used to train networks on 
all partitions of the training data. The results of these runs are shown 
below. 



VARIABLES 


TIME SET PERFORMANCE 


3 


57.5% 


3,9 


53.5% 


3,7,9 


53.0% 


4,6,9,10 


57.0% 



Since both rankings of variables showed 3, 7, and 9 as important, 
it is likely that this combination would give higher than 57.5% provided 
there was enough training data. Training example performance for this 
combination was 63.9%, showing the level of overtraining that occurred. 
Several of the first partition networks shown above had combinations of 
variables chosen by a neural network trained to predict the test 
performance. Those networks are indicated by a number in the last 
column. This number indicates the sequence in which tests were run. 
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Combinations without a number were chosen manually from the rankings. 
It is likely that is this process were continued that the predictor network 
would eventually find the best combination. Since there are many factors 
that can effect the test set performance, it is likely that there is a lot of 
"noise" in the test set results. For this method to work better, a 
consensus approach may be needed to generate the training values for 
the predicted test set error. This problem would also be seen when using 
the consensus approach. 
Conclusion 

The process of using the sensitivity and contingency table rankings 
of variables is an effective and efficient technique for picking a set of 
variables to maximize the neural network performance. The top 3 
variables under both rankings were the same, indicating that these 
methods are performing well. This method appears to work with the 
Western Blot data but should work well on any form of data, making this 
a general purpose neural network technique that can also be applied to 
patient history data. 

The above results indicate more data would improve the level of 
performance. The sensitivity analysis shows little variation in the relative 
values of variables. Most of the variables contribute to the solution. This 
should be expected since the bins were chosen based on an analysis of 
neural network weights trained on the full Western Blot images. By using 
all or most of the variables, however, the neural networks quickly get into 
an overtraining situation. This can be avoided by adding data to the 
training example. 

Tests with the neural network guided selection of variables proved 
to be less effective than the ranking approach. While the ranking 
approach was clearly the most effective, the neural network guided 
approach should also be able to eventually discover the best set of 
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variables. Because it is a more direct approach than a genetic algorithm, 
it Is likely to perform better than the genetic algorithm on similar data. 
The major drawback of this method is that it does not use the sensitivity 
analysis information to aid in the search. 

EXAMPLE 6 
Combine Patient History and ELISA Data 
Requirements 

Using the processing developed in the above examples, train a set 
of networks on the combination of Patient History Data and ELISA Data. 
An index generated from an ELISA test, based on the use of the entire set 
of antigens will be used to determine the improvement in performance 
achieved by combining this information with the patient historical data. 
Additional Requirements 

In addition to the above requirements, a comparison between the 
data from several ELISAs, ELISA 100 and ELISA 200 data and ELISA 2 
data, and an analysis of the interrelationships of variables were performed 
to help determine to what variables the original ELISA tests related. 
Methodology Used 

In order to determine the improvement in the diagnostic test 
performance achieved by the inclusion of ELISA test results, several 
trainings were made using the holdout methods described in EXAMPLE 2. 
The partitions of the data were made so that in each partition, 80% of 
the data was used for training and the remaining 20% was used for 
testing. 

To minimize the effects of the random starting weights, several 
networks were trained in the full training runs. In these runs three 
networks were trained in each of the five partitions of data, each from a 
different random start. The outputs of the networks were averaged to 
form a consensus result that has a lower variance than could be obtained 
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from a single network. Since the number of patients for which all forms 
of ELISA data was available was 325, new training runs with the original 
14 variables were made to provide an accurate means of comparing the 
effects of the ELISA data on the diagnosis of the disease. Analysis of the 
5 ELISA 2 data showed a large range of values for the test. Plots showing 
the relation of ELISA 2 to the ELISA 100 data suggested that the log of 
the ELISA 2 data might be better than the raw value. 

The comparison training runs were organized as follows: 

1-^ Run 1 :ELISA 100, ELISA 200, log (ELISA 2) and the original 14 variables. 

m 10 Run 2: (ELISA 2) and the original 14 variables 



R Run 3: The original 14 variables 



After making these comparison runs, a final set of networks was 

{ ifl 

i;n trained on the complete data set of 325 patients. In the final set of 



networks, a consensus of eight networks was made to produce the final 
15 statistics. The final run statistics, are reported only on the training data 
M and represent an upper bound on the true performance. The results from 

the last holdout run represent a possible lower bound on the performance. 

From the training data each of the 65 variables, including not 
available for a diagnosis, were built into a training example of 325 training 
20 examples. The TrainDos training program was modified to automate the 
generation of networks to provide relationships between the variables. In 
each of 65 networks, one of the variables was predicted by remaining 64. 
A sensitivity analysis was performed for each network to indicate the 
importance of each variable in making the prediction. 
25 Results 

The consensus results for the three comparison runs are as follows: 



-98- 



24727-801 F 



Run 1:AII ELISA variables (CRFEil) 66.46% 

Run 2:Log of ELISA 2 (CRFEL2) 66.77% 

Run 3:No ELISA Variables (CRFELO) 62,76% 
Comparison of Run 1 and Run 2 show that the addition of ELISA 100 and 
ELISA 200 data to ELISA 2 data had no effect. Therefore, ELISA 100 and 
ELISA 200 variables could be eliminated. 

Comparisons of Run 2 and 3 showed that an input based on the 
ELISA test improved the diagnosis of the disease. 

Comparison of Run 3 to pat06 showed a 5.47% drop in test 
performance. This could only be due to the decrease in the number of 
patients available on training. This also suggest that an increase in 
training data above 500 is likely to have a significant effect on the 
performance of the neural network on test data. 

Based on these results, final networks were trained. Eight 
networks were trained on the 325 patients. The performance on the 
training data was 72.31 %. While this is similar results to the pat07 
runs, it is clear that the improvement due to ELISA 2 data are being offset 
by a decrease in the amount of available training data. 

Sensitivity Analysis results showed that the ELISA 2 variable 
ranked 7 out of the 15 variables used. 

Plots of the hidden processing element outputs were made from 
the log files of the eight trained networks. A means was found so that 
the desired output could be indicated on the plots. By comparing the 
eight networks, it is clear that each performed the task in a different way. 
Some clustering of data points is seen in a few plots. Since this did not 
occur consistently, no conclusions could be drawn. 

Statistics were generated on the last training run based on the use 
of a cutoff in the network output values. If the network output was 
below the cutoff, the example was rejected from consideration. The 
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following table is a summary of the results for the consensus of eight 
networks in CRFLE2. 



Cutoff 


0.00 


0.02 


0.05 


0.075 


0.10 


Sensitivity 


.7790 


.7911 


.9016 


.9596 


.9595 


Specificity 


.6549 


.6552 


.7200 


.7755 


.8529 


PPV 


.7421 


.7576 


.8397 


.8962 


.9342 


NPV 


.6992 


.6972 


.8181 


.9048 


.9063 


Odds Ratio 


2.63 


2.75 


4.96 


8.86 


12.5 


% Rejected 


0.62 


15.69 


39.38 


54.46 


66.76 



In general, these results are better than the results for pat07. 

A test program named adzcrf2.exe (see. Appendix II) was produced 
as a demo of this final training. This program permits the running of 
pat07 and CRFEL2 based on the value input in the ELISA field. A value of 
0 in the field causes pat07 to be used. 

The analysis of variable relationships was performed. Based on the 
analysis of the relationships, the variables which showed Endo Present as 
a contributing factor were compared to the variables used in predicting 
Endo. Results of training two networks (PATVARSA and PATVARS3) 
showed that in the case of Endo, relationships were not symmetric, as 
they are when using correlation. CRFVARSA.XLS was built from the 
sensitivity analysis results to summarize the results. These results show 
the nonlinear nature of the relationships. The importance of a variable is 
affected by the other variables in the training run. This suggests that a 
means of eliminating unimportant variables in an automated fashion may 
be required to increase the usefulness of this analysis. 

Analysis of the variables relationship (CRFVAROO - CRFVAR64) 
showed that in most cases, the log of the ELISA 2 test had greater 
significance than the raw ELISA 2 value. In particular, the log value 
ranked higher for both predicting Endo Present and AFS Stage. 
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Conclusions 

The ELISA 2 test adds to the predictive power of the neural 
network. The ELISA 2 test has eliminated the need for the original ELISA 
tests. Based on this result It is likely that results of the work with the 
5 Western Blot data will further improve the power of the neural network 
diagnostic test. 

The effects of increased training data could be clearly seen in the 
comparison of Run 3 to pat06. The difference in performance suggests 
that the performance of the neural network will increase substantially 
10 with an increase in training data. It appears from the comparison that 
doubling the data could improve the performance by 10-15%. With 8-10 
times the data the performance might increase to 75-80%. 



Patient History Stage/AFS Score Training 
1 5 Requirements 

Using methods developed in the above Examples, identify relevant 
variables for either the stage of the disease or the AFS Score. The 
selection of the target output variable to be used will be determined by a 
comparison of test set performance from a training run using the phase 1 
20 list of important patient history variables. Once the list of important 

variables are selected, a consensus of 8 neural networks will be trained 
on the 510 patient database. 



25 /AFS score desSred output. There were 7 patients missing Stage 

information and 28 patients missing Score information. For the stage 
variable, the ajverage value of 2.09 was used where the data was 
missing. For score, the missing data was replaced with a value 
depending on ithe value of the stage variable. For stage 1, a score of 3 



EXAMPLE 7 




Trainingjexamples were built for the Stage desired output and the 
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was used. From stage 2,/l0.5 was used. For stage 3, 28 was used and 
for stage 4 the value 55 was used. 

Stage and score were reprocessed so that the desired output would 
fall in the range of 0.0 to 1 .0. Stage was translated linearly. Two 
5 methods were used for score. The first was the square root of the score 
divided by 12.5. The second was the log of score + 1 divided by the log 
of 150. 

The holdout method was used to train networks on stage, square 
^ root score and log of score. These networks were trained using 45 

i:3 - 

|;3 10 variables. The results were compared to determine which variable and 

i j processing would be used for the remainder of the Example. The log of 



in 



i;3 



score was chosen. 

At this point the procedure for isolating the set of important 
variables was begun. Eight networks were trained on the full training 



15 example and the consensus sensitivity analysis was generated to produce 



the first ranking of the variables. Then the Chi Square contingency table 
was generated to produce the second ranking of the variables. The 
procedure for isolating important variables was started manually, but was 
found to be too time consuming. The procedure was implemented as a 
20 computer program and was run on a computer for about one week before 
completing. 

From the results of the variable selection, a set of eight networks 
were trained on the full training example. The consensus results were 
analyzed and compared to the Endo present results. 
25 Results 

The sensitivity analysis of all 45 variables gave the following 
ranking of variables: 
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Name 


Input # 


Past history of Endo 


33 


Hist of Pelvic Surgery 


34 


Dysmenorrhea 


40 


#Birth 


1 1 


age(preproc) 


1 


Medication History 


35 


#Preg 


10 


Ectopic Pregnancy 


26 


Pelvic Pain 


37 


Adnnexai msses/Thickening 


43 


#Abort 


12 


Current Exod. Hormones 


36 


Hormone induced 


18 


Other history 


32 


Infertility 


42 


Packs/Day 


8 


Uterine/Tubal Anomalies 


24 


Dyspaarunia 


41 


Anovulatory 


15 


Hist of Infert 


13 


Unknown 


16 


Menstrual Abnormalities 


39 


Gyn Cancer 


31 


Abnormal PAP/Dysplasia 


30 


Other Symptoms 


45 


Herpes 


19 
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10 



15 



20 



Name 


Input # 


Ovulatory 


14 


Genital Warts 


20 


Fibroids 


25 


Ovarian Cysts 


28 


Drug Use 


9 


Dysfunctional Uterine Bid. 


27 


PID 


23 


Hypertension 


4 


Vag Infections 


22 


Undeterrnined 


44 


Other STD 


21 


Preg HTM 


5 


Autoimmune Disease 


6 


Abdominal Pain 


38 


Preg Induced DM 


3 


Oligoovulatory 


17 


Diabetes 


2 


Polycystic Ovarian Snyd 


29 


Transplant 


7 



The Chi Square analysis gave the following ranking of variables: 



25 



Name 


Input # 


Past history of Endo 


33 


#Preg 


10 


#Birth 


11 


Packs/Day 


8 
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Name 


Input # 


Dysmenorrhea 


40 


Pelvic Pain 


37 


Preg HTIVI 


5 


Hist of Infert 


13 


Abnormal PAP/Dysplasia 


30 " 


Infertility 


42 


Diabetes 


2 


Herpes 


19 


Hist Of Pelvic Surgery 


34 


#Abort 


12 


Other Symptoms 


45 


Medication History 


35 


Undetermined 


44 


Dysfunctional Uterine Bid. 


27 


Gyn Cancer 


31 


Uterine/Tuba! Anomalies 


24 


Polycystic Ovarian Synd 


29 


Dyspaarunia 


41 


Genital Warts 


20 


Adnnexal masses/Thickening 


43 


Oligoovulatory 


17 


Autoimmune Disease 


6 


Abdominal Pain 


38 


Unknown 


16 


Ectopic Pregnancy 


26 


Ovulatory 


14 
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Name 


Input # 


Fibroids 


25 


Current Exod Hormones 


36 


Ovarian Cyst 


28 


Drug Use 


9 


Vag Infections 


22 


Preg Induced DM 


3 


PID 


23 


age(preproc) 


1 


Hormone induced 


18 


Anovulatory 


15 


Menstrual Abnormalities 


39 


Hypertension 


4 


Other STD 


21 


Transplant 


7 


Other history 


32 



The variables chosen in the variable selection procedure were as follows, 
showing the ranking from the final sensitivity analysis: 



Name 


Input # 


Past history of Endo 


33 


#Preg 


10 


Hist of Pelvic Surgery 


34 


age(proproc) 


1 


Dysmenorrhea 


40 


Ectopic Preg. 


26 


Oligoovulatory 


17 
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The comparison of the score network to the Endo present network can be 
performed by forcing a threshold on the desired score output to produce 
an Endo present comparison. The results for the score and the pat07 
networks are shown below. 



NETWORK 


PAT07 


SCR07 


Sensitivity 


.828179 


.679525 


Specificity 


.598174 


.647399 


PPV 


.732523 


.789655 


NPV 


.723757 


.509091 


Odds Ratio 


2.695 


2.017751 



Conclusions 

The set of variables identified in this Example appear to be 
reasonable. 

The automated variable selection methodology appears to function 
properly. The choice of variables is well predicted by the sensitivity 
analysis. 

Now that there are two methods for predicting the disease, the 
Endo present network and the Score network could be combined to 
improve the reliability of the prediction. 

EXAMPLE 8 
Patient History Adhesions Training 
Requirements 

Using methods as outlined in EXAMPLE 7, identify relevant 
variables for Adhesions target output variable. This target output variable 
will be run using the phase 1 list of important patient history variables. 
This will also permit a comparison of the new outputs to the Endo present 
target variable used in phase 1 . Once the list of important variables are 
selected, a consensus of 8 neural networks will be trained on the 510 
patient database. 
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Methodology Used 

Training data for the adhesions variable was generated in the same 
manner as for EXAMPLE 7. The adhesions variable generated two output 
variables in a manner similar to that used for Endo present. At this point 
the procedure for isolating the set of important variables was begun. 
Eight networks were trained on the full training example and the 
consensus sensitivity analysis was generated to produce the first ranking 
of the variables. Then the Chi Square contingency table was generated 
to produce the second ranking of the variables. The procedure for 
isolating important variables was started manually, but was found to be 
too time consuming. The procedure was implemented as a computer 
program and was run on a computer for about one week before 
completing. 

From the results of the variable selection, a set of eight networks 
were trained on the full training example. The consensus results were 
analyzed and compared to the Endo present results. 
Results 

The sensitivity analysis of all 45 variables gave the following 
ranking of variables: 



Name 


Input # 


Hist of Infert 


13 


Medication History 


35 


Ectopic Pregnancy 


26 


Packs/Day 


8 


Hist of Pelvic Surgery 


34 


Infertility 


42 


#Birth 


1 1 


Dyspaarunia 


41 
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Name 


Input # 


Hormone Induced 


18 


Past history of Endo 


33 


Herpes 


19 


#Preg 


10 


Current Exod. Hormones 


36 


age(preproc) 


1 


Dysmenorrhea 


40 


Uterine/Tubal Anomalies 


24 


Anovulatory 


15 


Other history 


32 


Ovarian Cyst 


28 


Pelvic Pain 


37 


Gyn Cancer 


31 


Ovulatory 


14 


Menstrual Abnormalities 


39 


#Abort 


12 


Unknown 


16 


Abnormal PAP/Dysplasia 


30 


Abdominal Pain 


38 


Adnnexal masses/Thickening 


43 


Fibroids 


2o 


reiivic iniiammaTory oisease 
(PID) 


zo 


Dysfunctional Uterine Bid 


27 


Vag Infections 


22 


Drug Use 


9 


Other Synnptoms 


45 
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10 



15 



20 



25 



Name 


Input # 


Genital Warts 


20 


Autoimmune Disease 


6 


Hypertension 


4 


Otfier STD 


21 


Preg Induced DM 


3 


Preg HTM 


5 


Polycystic Ovarian Synd 


29 


Oligoovulatory 


17 


Diabetes 


2 


Undetermined 


44 


Transplant 


7 


The Chi Square analysis gave the following 


Name 


Input # 


Hist of Infert 


13 


Infertility 


42 


Medication History 


35 


Herpes 


19 


Ovulatory 


14 


Dysmenorrhea 


40 


Dyspaarunia 


41 


Packs/Day 


8 


Ectopic Preg. 


26 


Current Exod. Hormones 


36 


Menstrual Abnormalities 


39 


Anovulatory 


15 
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Name 


Input # 


Past history of Endo 


33 


Fibroids 


25 


Hormone induced 


18 


#Preg 


10 


Gyn Cancer 


31 


Hist of Pelvic Surgery 


34 


PID 


23 


Uterine/Tubal Anomalies 


24 


#Abort 


12 


#Birth 


11 


Other STD 


21 


Abdominal Pain 


38 


Unknown 


16 


Vag Infections 


22 


Abnormal PAP/Dysplasia 


30 


Dysfunctional Uterine Bid 


27 


Oligoovulatory 


17 


Polycystic Ovarian Synd 


29 


Autoimmune Disease 


6 


Genital Warts 


20 


Other Symptoms 


45 


Ovarian Cyst 


28 


Other history 


32 


Pelvic Pain 


37 


age(preproc) 


1 


Preg Induced DM 


3 
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15 



20 



25 



Name 


Input # 


Preg HTM 


5 


Adnnexal masses/Thickening 


43 


Undetermined 


44 


Diabetes 


2 


Drug Use 


9 


Hypertension 


4 


Transplant 


7 



The variables chosen in the variable selection procedure were as 
10 follows, showing the ranking from the final sensitivity analysis: 



Name 


Input # 


Hist of Infert 


13 


Dyspaarunia 


41 


Ectopic Preg 


26 


Packs/Day 


8 


Hist of Pelvic Surgery 


34 


Medication History 


35 


Ovulatory 


14 


Menstrual Abnormalities 


39 


Oligoovulatory 


17 



The comparison of the Score network to the Endo present network 
can be performed by forcing a threshold on the desired score output to 
produce an Endo present comparison. The results for the score and the 
pat07 networks are shown below. 



NETWORK 



PAT07 



ADH07 
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Sensitivity 


.828179 


.825083 


Specificity 


.598174 


.473430 


PPV 


.732523 


.696379 


NPV 


.723757 


.649007 


Odds Ratio 


2.695 


2.148148 



Conclusions 

The set of variables identified in this Example appear to be 
reasonable. The automated variable selection methodology appears to 
10 function properly. The choice of variables is well predicted by the 
sensitivity analysis. 

EXAMPLE 9 

This example shows the reprodicibility of the process provided herein. 

Methodology Used 
15 Software used for the selection of important variables for 

Adhesions and Score was modified to operate with the Endo present 
desired output. The software was further modified to allow it to be run in 
the general case instead of needing to be recompiled for each specific 
test. 

20 The run was made on the Endo Present variable in the same 

fashion as the runs for Adhesion and score. This included using a 
consensus of 4 networks during the variable selection process. The 
training data was partitioned into five partitions during the training 
process, generating a total of 20 networks for each evaluation of the 

25 current set of variables being tested. 

The results of runs with different random number seeds indicated 
that the number of networks in the consensus might need to be 
increased. 
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Two additional variable selection runs were nnade with a consensus 
of 10 networks used during the process. In this case, a total of 50 
networks were trained to evaluate a single combination of variables. Two 
separate runs were made in this fashion with only the random starting 
seed changing. 

From these final two variable selection runs, a set of eight 
networks were trained for each of the sets of variables (patOS, pat09), to 
allow their performance to be evaluated on new data (not included in the 
original 510 record database). Statistics on the performance of these 
networks were generated so that they could be compared to the original 
pat07 consensus nets. 
Results 

In each case where different random number seeds were used, the 
variable selection process found a different set of important variables. 
When the number of networks in the consensus was increased to 10, the 
variables common in the different runs increased. 

Many of the original 14 variables used for pat07 were confirmed as 
being important in the variable selection runs using 10 consensus nets. 
The final runs made on the selected variables were named pat08 and 
pat09. 

The variables used in pat08 and pat09 consensus networks are 
shown below along with their sensitivity analysis rankings: 



INPUT 


PATOS Variable Name 


Average 


10: 


Past HX of Endo 


1.525819 


05: 


#Birth 


1 .489470 


14: 


Dysmenorrhea 


1 .302913 


12: 


Medication History 


1.263276 


11: 


Hist of Pelvic Surgery 


1.235421 


13: 


Pelvic Pain 


1.192193 
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INPUT 


PAT08 Variable Name 


Average 


01 : 


Age(preproc) 


1.178076 


04: 


Packs/Day 


1 .063709 


06: 


Hist of Infert 


0.943830 


07: 


Hormone Induced 


0.925489 


16: 


Other Symptoms 


0.879384 


15: 


Infertility 


0.873955 


09: 


Abnormal PAP/Dysplasia 


0.7061778 


03: 


Preg HTN 


0.576799 


08: 


Herpes 


0.441410 


02: 


Diabetes 


0.402080 


06: 


Past HX of Endo 


1.420401 


04: 


#Birth 


1.353489 


01: 


age{preproc) 


1.187359 


09: 


Pelvic Pain 


1.183929 


10: 


Dysmenorrhea 


1.141531 


07: 


Hist of Pelvic Surgery 


1 .064250 


03: 


Packs/Day 


1 .042488 


11: 


Other Symptoms 


0.780530 


05: 


Herpes 


0.699654 


08: 


Medication History 


0.623505 


02: 


Preg HTN 


0.502863 



Conclusions 

The variable selection process appears works well and has 
produced two alternative networks that work as well or better than the 
pat07 nets. IThe reason for this conclusion is that the performance 
statistics gerlerated only on the training data appear slightly better for the 
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(^1 



m 



pat07 then pat08 and pat09. Sincb the variable selection process 
carefully picks variables based on est set performance, the associated 
networks are not likely to have be<sn overtrained. As a network becomes 
overtrained the typical characteristic is that the training example 
5 performance increases and the tesi set performance decreases. Thus the 
higher performance of pat07 may be the result of slight overtraining. 

While the variable selection process appears to have produced two 
alternative selections on the same training data, the performance of the 
two selections appears to be very similar. This is based on the test set 
10 performance of the final variable selections for the two runs. It has 
become clear that when two variables are closer in their relative 
performance, then random factors can influence their relative ranking. 
The random factors in the variable selection runs included the random 
starting points and the use of added noise on the inputs during training. 



m 

n 

^ 15 The random noise has been shown to aid in producing better 



generalization (translation: test set performance). As the number of 
networks in the consensus increases, the effects of the random 
influences are decreased. 

The determination of a set of variables that produces a high quality 
20 network seems to be addressed by the variable selection process. As 

more combinations of variables that work successfully are enumerated, it 
is evident that certain variables or combinations of variables are essential 
to good performance. 

EXAMPLE 10 

25 Evaluation of the Elimination of Past History of Endometriosis and History 
of Pelvic Surgery on Diagnostic Performance. 

The purpose of this Example was to determine the importance of 'Past 
history of endometriosis' and 'Past history of pelvic surgery' variables in 
evaluating a patient's risk of having endometriosis, and to provide an 



-116- 





24727-801 F 



alternative means (different from sensitivity analysis) to measure the 
importance of any given variable in predicting the outcome. 
Tasks: 

1. Apply the variable selection process excluding the 'Past history of 
endometriosis. 

2. Repeat task (1) using different random seed variables for the 
variable selection process. 

3. For both sets of 'endometriosis relevant variables' identified in 
tasks (1) and (2) above, complete the consensus network training 
process. 

4. Repeat the above tasks (1), (2) and (3) excluding the 'Past history 
of Pelvic surgery' variable from the Endometriosis database. 

5. Repeat the above tasks (1), (2), and (3) excluding both the 'Past 
history of endometriosis' and the 'Past history of endometriosis' 
variables from the Endometriosis database. 

Methodology Used 

The variable selection software developed in Example 9 was used 
as the basis to generate results for each of Example 10. The software 
was modified so that the user could identify variables that would be 
excluded from consideration based on the requirements of Example 10. 
This software was also modified to allow the reporting of classification 
performance for each of the sets of variables tested so that the effect of 
an eliminated variable could be more easily understood. 

For each variable selection run that was made, parameters for the 
variable selection process were set as follows: 



Number of partitions: 



5 



Consensus networks: 



10 



Training example size: 



510 



Number of passes: 



999 
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The ordering of database variables in the variable selection process 
was based on the sensitivity analysis and Chi square analysis. This 
ordering was the same as used in pat08 and pat09. 

The networks trained for this Example are identified as follows (the 
5 two nets have different random seeds); 

Past Hist, of Endo eliminated: patIO, pati 1 . 

Hist, of Pelvic Surgery eliminated: pati 2, pati 3. 
Both variables eliminated: pati 4, pati 5. 

M Once the variable selection process was completed for each of the 

1:3 

□ 10 combinations of variables and random seeds, a set of eight networks 

j'^ were trained using the selected variables identified. Each of these 

networks was trained on the complete 510 record database. From these 
training runs, a consensus of the outputs was generated in an Excel 
spreadsheet so that the performance of each of the networks could be 
15 evaluated. 
Results 

The typical performance of a consensus of networks was estimated 
using the holdout method with a partition of 5. When all variables were 
available, as in patOS and pat09, the classification performance was 
20 estimated to be 65.23%. 

When the past history of endometriosis variable was eliminated 
from consideration (patIO and pati 1), the performance was estimated at 
62.47%. This represents a drop of 2.76%. 

When the history of pelvic surgery variable was eliminated from 
25 consideration (pati 2 and pati 3), the performance was estimated at 
64.52%. This represents a drop of only 0.72%. 

When both variables were eliminated from consideration (pati 4 and 
pati 5), the performance was estimated to be 62.43%. This represents a 
drop of 2.80%. This is only slightly worse than Just eliminating past 



C3 

ru 
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history of endometriosis and appears to be consistent with other results 
based on the assumption that the variables are independent (are not 
correlated). 
Conclusions 

While history of pelvic surgery was used by the neural networks 
when it was available, the effect of eliminating this variable was minimal. 
The neural networks appear to be able to compensate for the elimination 
of this variable by using other information. 

The removal of Past history of Endometriosis was significant. This 
variable was always at the top of the list in any sensitivity analysis. Its 
elimination caused about a 2.76% drop in performance over the average 
when all variables were available for use. Given that the average 
performance is estimated at 65.23%, and 50% can be achieved by 
chance, this represents an effective drop in performance of 18.12%. 

When both variables were eliminated, there does not appear to be 
any significant drop in performance, which indicates there was no 
interaction between these two variables. This process of eliminating a 
variable and running the variable selection process appears to be a good 
approach to determining the true value of a given variable. It should be 
noted that there are two variables that are important to the diagnosis, but 
are highly correlated, the elimination of only one would have little effect 
since the network would compensate by using the other. It is only when 
both were eliminated that their value becomes clear. 

EXAMPLE 1 1 

Evaluation of tlie Elimination of Pelvic Pain and dysmenorrhea on 
Diagnostic Performance 

Requirements 

Goals: 

1. To determine the importance of 'Pelvic Pain' and "Dysmenorrhea' 
variables in evaluating a patient's risk of having endometriosis. 
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2. To provide a separate mechanism (different from sensitivity 
analysis) to measure the importance of any given variable in 
predicting the outcome. 

Tasks: 

1 . Apply the variable selection process described herein. 

2. Repeat task (1) using different random seed variables for the 
variable selection process. 

3. For both sets of 'endometriosis relevant variables' identified in 
tasks (1) and (2) above, and complete the consensus network 
training process. 

4. Repeat the above tasks (1), (2) and (3) excluding the 
'Dysmenorrhea' variable from the Endometriosis database. 

5. Repeat the above tasks (1), {2), and (3) excluding both the 'Pelvic 
Pain' and the 'Dysmenorrhea' variables from the Endometriosis 
database. 

Methodology Used 

The variable selection software developed in Example 9 was used 
as the basis to generate results for each of these tasks. 
For each variable selection run that was made, parameters for the variable 
selection process were sat as follows; 

Number of partitions: 5 

Consensus networks: 10 

Training example size: 510 

Number of passes: 999 
The ordering of database variables in the variable selection process 
was based on the sensitivity analysis and Chi square analysis. This 
ordering was the same as used in pat08 and pat09. The networks trained 
for this task are identified as follows (the two nets have different random 
seeds); 
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Pelvic Pain eliminated: pat16, pat17, pat17A 

Dysmenorrhea eliminated: pat18, pat19. 

Both variables eliminated: pat20, pat21. 

Four variables (EXs, 11 and 12): pat22, pat23, and pat23A 
5 Once the variable selection process was completed for each of the 

combinations of variables and random seeds, a set of eight networks 

were trained using the selected variables identified. Each of these 

networks was trained on the complete 510 record database. From these 
la training runs, a consensus of the outputs was generated in an Excel 

Jil 10 spreadsheet so that the performance of each of the networks could be 

'|S evaluated. 

r,y Results 

m 

The typical performance of a consensus of networks was estimated 
'[ using the holdout method with a partition of 5. When all variables were 

M 15 available, as in patOS and pat09, the Classification performance was 

\ , estimated to be 65.23%. 

When the pelvic pain variable was eliminated from consideration 
(pat16 and pat17), the performance was estimated at 61.03%. This 
represents a drop of 4.20%. 
20 When the dysmenorrhea variable was eliminated from consideration 

(pat18 and pat19), the performance was estimated at 63.44%. This 
represents a drop of only 1 .79%. 

When both variables were eliminated from consideration (pat20 and 
pat21), the performance was estimated to be 61 .22%. This represents a 
25 drop of 4.00%. This is better than when pelvic pain was eliminated only. 
This suggests that the performance drop for pelvic pain is overstated. 
The best performing network that did not include pelvic pain had a 
performance of 62.29%, which gives a drop of 2.94%. This would be 
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more reasonable an estimate given the performance when both were 

eliminated. 

Conclusions 

With the four variables tested, the ranking of the variables in order 
of importance is as follows: 

Pelvic pain 2.94% - 4.20% drop 

Past hist, of endo 2.76% drop 

Dysmenorrhea 1.79% drop 

Hist, of pelvic surg 0.72% drop 

This process of eliminating a variable and running the variable 
selection process is a good approach to determining the value of a given 
variable. It should be noted that if there were two variables that were 
important to the diagnosis, but were highly correlated, the elimination of 
only one would have little effect since the network would compensate by 
using the other. It is only when both were eliminated that their true value 
would become clear. 

EXAMPLE 12 

Training of Neural Network to Differentiate Mild Versus Severe 
Endometriosis 

Goals: ^ 

1 . To train a consensus of networks which differentiates between 
minimal/mild versus moderate/severe endometriosis. 

Task: 

1 . Train networks to AFS score as follows: 

positive = Endo Stage III or IV 
negative = No Endo, Endo Stage I or II 

2. Apply the variable selection process described in Method for 
Developing Medical and Biochemical Tests Using Neural Networks 
of the Endometriosis database. 
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3. Repeat task (2) using different random seed variables for the 
variable selection process. 

4. Compare variables selected in (2) and (3) above before proceeding. 
If selected set of variables differ widely, repeat task (2) using 
different random seed weights. 

5. Train final consensus networks for variables selected in (2) and (3) 
above. 

6. Repeat steps (2) through (5) using only the subset of the 
endometriosis database for which Endo was present in the patient. 

Methodology Used 

The variable selection software developed in EXAMPLE 10 and 
modified in EXAMPLE 1 1 , was used as the basis to generate results for 
each of the tasks in this example. 

For each variable selection run that was made, parameters for the 
variable selection process were set as follows; 

Number of partitions: 5 

Consensus networks: 20 

Training example size: 510 (290 for step (6)) 

Number of passes: 999 
The ordering of database variables in the variable selection process was 
based on the sensitivity analysis an chi square analysis run specifically for 
the new target output described in Example 1 . The networks trained for 
this Example are identified as follows (the two nets have different random 
seeds); 

Nets trained on full database: AFS01 and AFS02 
Nets trained on Endo present subset: AFSEP1 and AFSEP2. 
Once the variable selection process was completed for each of the 
combinations of variables and random seeds, a set of eight networks 
were trained using the selected variables identified. Each of these 
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networks for AFS01 and AFS02 variables were trained on the complete 
510 record database. Each of the networks for AFSEP1 and AFSEP2 
variables were trained on the 291 records for which the endo present 
variable was positive. From these training runs, a consensus of the 
outputs was generated in an Excel spreadsheet so that the performance 
of each of the networks could be evaluated. 
Results I 

The count of variables found in the reduced subset run was smaller 



that for the runs on the full training example. The typical performance of 
a consensus of networks was estimated using the holdout method with a 
partition of 5. The typical classification performance for the AFS run 
using the full training example was 77.22549%. The typical 
classification performance om the endo present subset was 63.008621 %. 
If all examples were classified as negative, the performance for the full 
training example would be 7 8.82% and 65.29% for the subset. By 
changing the cutoff values lor positive and negative classification better 
performance than suggested by these numbers can be achieved. 
Conclusions 

The results of the variable selection runs for the full training 
example and the subset of vrtdo present examples suggests that the size 
of the training example is of importance in the determination of the 
important variables. It is cU ar that as the size of the training example 
increases, more variables w II be considered important. This result can 
also be interpreted as an indication that more training data will improve 
the variable selection process and also the overall performance of the 
consensus networks used inj building the diagnostic test. 
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EXAMPLE 13 



Variable Selection and development of neural nets for predicting 
pregnancy related events and improvement of the performance of tests 
for fetal fibronectin 

The Fetal Fibronectin Enzyme Immunoassay (fFN ELISA) detects 

the presence or absence of fetal fibronectin (fFN) in cervicovaginal 

secretions (see, U.S. Patent No. 5,468,619). Detection of fFN in 

cervicovaginal secretions of symptomatic pregnant women between 24 

and 34 completed weeks gestation is associated with preterm delivery. 

This test is used to predict impending delivery within 7 or 14 days of 

sampling. For prediction of delivery within 14 days for sampling of fFN, 

the negative result is greater than 99% accurate. The positive result is 

more difficult to interpret, and the positive predictive value is less 

than 20%. 

Neural networks were trained to assess the risk of preterm delivery 
using over 700 examples of pregnant women who were symptomatic for 
preterm delivery. Each example contained a multitude of information 
about that patient, including symptoms, reproductive history and other 
factors. Neural networks were trained to recognize complex patterns of 
interactions between these factors that indicate when a woman is at risk 
for preterm delivery. These neural networks are contained in the Preterm 
Delivery Risk Assessment software, which augments the fFN ELISA test 
result by decreasing false positive observations. 
A. Variables I 



The following are vaniables based on patient input data. Neural 
networks using all or seleclJed subsets of these variables may be 
generated. Combinations Df at least three of these variables may be 
used in conjunction with decision-support systems, particularly neural 
nets to predict risk of preterm delivery or impending delivery. The inputs 
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for the variables are either yes, ifo, no answer, or a text input, such as 
age. The variables, listed by t\/pe are as follows: 

1 Age 
Ethnic origin variables: 

2 EthOrgI: Caucasian; 

3 EthOrg2: Black; 

4 EthOrg3: Asiafn; 

5 EthOrg4: Hispanic; 

6 EthOrgS: Native American; and 

7 EthOrgS: Otjfier than the above. 
Marital status variables: 

8 MarStI: Sirjgle; 

9 MarSt2: Married; 

10 MarStS: Divorced/Separated; 

11 MarSt4: Widowed; 

12 MarSt5: l iving with partner; or 

13 MarStS: Other than those listed above. 
Education variables : 

14 EduO: Unknown; 

15 Edul: < High School; 

16 Edu2: Hi jh School Graduate; or 

17 Edu3: Cc llege/trade. 
Patient complaint variables: 

18 PATIENT COMPLAINT #1 Patient has Uterine 
Contractions with or without pain; 



19 PATIENT 

abdominal pain, dull, low 

20 PATIENT 
second or third trimester 



COMPLAINT #2 Patient has Intermittent lower 

backache, pelvic pressure; 

COMPLAINT #3 Patient has bleeding during the 
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21 PATIENT COMPLAINT #4 Patient has menstrual-like or 



intestinal 
cramping; 



22 PATIENT COMP 



^ .AINT #5 Patient has change In vaginal 
discharge or amount, color, or consistency; or 

23 PATIENT COMPLAINT #6 Patient is not "feeling right". 
Variables from physician itests and assessments: 

24 Pooling refers tp visual assessment to determine whether 
amniotic fluid has leaked into tine vagina (see, e.g. . Chapter 36, Section 
18, p. 657 in Maternal Fetal Medicine: Principle and Practice , 2nd 
Edition, Creasy, R.F. et aL / ede W.B, Saunders & Co. (1989)); 

25 Ferning refers to the results of a test to detect the pattern 



formed when amniotic fluid is 
clean slide and allowed to air 



657 in Maternal Fetal Medicine: Principle and Practice . 2nd Edition, 



Creasy, R.F. et aL. eds., W.B 



measure the pH of amniotic fl 
e.g. . Chapter 36, Section 18, 



Dresent in a cervical sample smeared on a 
ilry (see, e.g. . Chapter 36, Section 18, p. 



Saunders & Co. (1989)); 



26 Nitrazine refers to results from a known test used to 



jid that has leaked into the vagina (see, 
p. 657 in Maternal Fetal Medicine: 
Principle and Practice . 2nd Edlition, Creasy, R.F. et aL, eds., W.B. 
Saunders & Co. (1989)); 

27 estimated gestational based (EGA) on last period (LMP); 

28 EGA by sononram (SONO); 

29 EGA by Best- EGA is the best of the EGA by SONO and 
EGA by LMP determined as fcillows: 

if E(3A by SONO is < 13 weeks, then EGA best 
is EGA SONO; 
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if the diWerence by EGA by LMP and EGA by 
SONO is > 2 veeks, then EGA best is EGA by SONO; 
otherwise EGA best is EGA by LMP; 

30 EGA at Sampling refers to the EGA when fFN sampled; 

31 CD INTERP, whichi refers to cervical dilatation (interpreted 
values - i.e. based on physicians jsstimates) where the number will be 
between 0 and 10 cm and is determined from the physicians response; 

32 Gravity, which r€ fers to the number of time woman has 
been pregnant; 

33 Parity-term, whidh refers to the number of term births; 

34 Parity-preterm, v^hich refers to the number of preterm 



births; 



35 Parity-abortions, 



which refers to the number of 
pregnancies ending in spontaneous or elective abortions; 

36 Parity-living, which refers to the number of living children; 

37 Sex within 24 hrjs prior to sampling for fFN; 

at time of sampling; 

39 Cervical consistency at time of sampling; and 

40 UC INTERP, which refers to uterine contractions per hour 



as interpreted by the physician- 
Complications 

41 0 COMP No previous pregnancies; 

42 1 COMP have had at least one previous pregnancy 
without complications; 



35 weeks); 



43 2nd comp at least one preterm delivery (delivery prior to 



44 3rd comp at leasjt one previous pregnancy with a 
premature rupture of membrane jPROM); 
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45 4th comp 



nt least one previous delivery "with incompetent 



cervix; 



47 6 COMP a 
spontaneous abortion prior 



46 5 COMP ai: least on previous pregnancy with pregnancy 
induced hypertension {PIH)/preeclampsia; 

least one previous pregnancy with 
to 20 weeks; 

48 OTHER CC&MP at least one previous pregnancy with a 
complication not listed abqve; and 

FN ELISA qualitative test result (if positive 
value is 1, if negative value is 0). 

The variable selection protocol has been applied to these variables 
for selected outcomes, and the results are set forth below. Exemplary 
neural nets are provided. 



B. 



A first set of neural networ/ks demonstrating that the methods 
herein can be used to predict pregnancy related events 



EGA1-EGA4 

For these nets the preterni 



delivery defined as less than or equal to 



34 weeks, 0 days. The other nuts herein (described below) define 
preterm delivery as less than or equal 34 weeks, 6 days. 

Data was collected from ihe over 700 test patients involved in a 
clinical trial of the assay described in U.S. Patent No. 5,468,619. 
Variable selection was performed without fetal fibronectin (fFN) test data. 
The final networks, designate EGA1-EGA4 were trained with the variables 



set forth in the table below. 

EGA1 - EGA4 represent neural networks used for variable selection. 
For EGA1, the variable selection protocol was performed a network 
architecture with 8 inputs in th e input layer, three processing elements in 
the hidden layer, and one output in the output layer. EGA2 is the same 
as EGAl, except that it is 9 inputs in the input layer. 
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a 

III 

h 

1.4 



10 



15 



20 



EGA3 has 7 inputs in the input layer, three processing elennents in 
the hidden layer, and one output in the output layer, EGA4 is the same 
as EGAl, except that it is 8 inputs in the input layer. 
The variables selected are as follows: 



EGA1 



EGA2 



fFN 



Ethnic Origin 1 
(Caucasian) 



EGA Sonogram 



EGA Best 
(physician's determination of 
estimated gestational age) 



EGA Sampling 



Cervical dilation interpretation 



Vaginal bleeding 
(at time of sampling) 



1 complications 
(prev. preg w/o complications) 



Other complications 
(prev. preg. w complications) 



EGA3 



EGA4 



fFN 



Ethnic Origin 4 
(Hispanic) 



Marital Status 5 
(living with partner) 



Marital Status 6 
(other) 



EGA Best 



Cervical dilation interpretation 



Vaginal bleeding 
(at time of sampling) 



Other complications 
(prev. preg. w complications) 



EGA = estimated gestational age 



Final consensus network performance 



Net 


TP 


TN 


FP 


FN 


SN 


BP 


PPV 


NVP 


OR 


EGA1 


35 


619 


92 


17 


67.3 


87.0 


27.6 


97.3 


6.0 


EGA2 


37 


640 


71 


15 


71.2 


90.0 


34.3 


97.7 


7.9 


EGAS 


36 


602 


109 


16 


69.2 


84.7 


24.8 


97.4 


5.1 


EGA4 


32 


654 


57 


20 


61.5 


92.0 


36.0 


87.0 


8.9 


fFN 


31 


592 


1 19 


21 


59.6 


83.3 


20.7 


96.6 


7.3 



25 
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EGA = estimated gestational age {less than 34 weeks); TP = true positives; 
TN=true negatives; FP = false positives; FN = false negative; 
SN = sensitivity; SP = specificity;, PPV = positive predictive value; 
NPV = negative predictiye value; OR = odds ratio (total number 
correct/total number correct answers); and fFN = the results from the 
ELISA assay for fFN. 

The results show that the network EGA4, the neural net that 
includes seven patient variables and includes the fFN ELISA assay and 
that predicts delivery at less than 34 weeks, has far fewer false positives 
than the fFN ELISA assay. In addition, the number of false positives was 
reduced by 50%. Thus, Incorporation of the fFN test Into a neural net 
improved the performance of the fFN ELISA assay. All of the neural nets 
performed better than the fFN test alone. 

Thus, the methods herein, can be used to develop neural nets, as 

well as other decision-support systems, that can be used to predict 

pregnancy related events. 

C. Neural network prediction of delivery before 35 completed 
weeks of gestation -EGAS and EGA6 

The fFN-NET database was used for all the experiments; 

organization of variables and order of variables was the same as 

described herein. Two variable selection runs were performed on the 

training data to determine the important variables to be used for the 

consensus runs. In each of the runs the hidden layer contained 5 

processing elements. This choice was based on the use of the variable 

selection process to determine the best size of the hidden layer. Variable 

selection was run with different numbers of hidden units in the neural 

network. The performance of the final selection of variables was 

compared for each different hidden layer configuration. Five hidden units 

were found to give the best performance. Each run used a partition of 5 

and a consensus of 10 networks. The top 10 variables were 
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examined during the run before a variable was selected to be in the 
selection. 

During these runs the biochemical test variable, fFN result, was not 
included 

in the possible variables for variable selection. 

The resulting choices of variables were then re-evaluated using a 
consensus 

of 20 networks so that the two separate runs could be compared on an 
equal 

basis. Then the fFN result variable was added to the selected variables 
and 

the selections were re-evaluated using a consensus of 20 networks. This 

allowed the effect of the biochemical test on the performance to be 
determined. The final consensus training runs, using 8 networks, were 
made using all available data for training and the best performing set of 
variables from the above evaluations with the fFN result variable included. 
1 . Variable selection 
Using the same database described above for EGA1-EGA4, the 
variable selection protocol was applied as described above, except that 
the variable selection procedure was applied in the absence of the fFN 
test result. Since it is known that the results of this test are highly 
predictive of preterm or impending delivery, it was excluded from the 
variable selection procedure in order to eliminate its overriding Influence, 
and to thereby select the important variables from among the other 48 
variables. 

Application of the variable selection procedure to the 48 variables 
resulted in selection of the following variables: 

1. Ethnic Origin 1: Caucasian ( i.e. , yes or no); 
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2. 


Marital Status 5: living with partner (yes or no); 


3. 


EGA by sonogram 


4. 


EGA at sampling 


5. 


estimated date of delivery by best 


6. 


cervical dilatation (CM) 


7. 


Parity-preterm 


8. 


vaginal bleeding at time of sampling 


9, 


cervical consistency at time of sampling ; and 


10. 


previous pregnancy without complication. 




2. Neural nets 



Using these variables two consensus networks were trained. One, 
designated EGAS, was trained without including the results of the fPN 
ELISA test result, and the other, designated EGA6, was trained with the 
results of the fPN ELISA test result. 

Fig. 17, which represents EGA6, is a schematic diagram of an 
embodiment of one type of neural network 10 trained on clinical data of 
the form used for the consensus network (Fig. 10) of a plurality of neural 
networks. The structure is stored in digital form, along with weight 
values and data to be processed in a digital computer. This neural 
network 10 contains three layers, an input layer 12, a hidden layer 14 
and an output layer 16. The input layer 12 has eleven input 
preprocessors 1 7-27, each of which is provided with a normalizer (not 
shown in the figure, see table below), which generates a mean and 
standard deviation value to weight the clinical factors and which are input 
into the input layer. The mean and standard deviation values are unique 
to the network training data. The input layer preprocessors 17-27 are 
each coupled to first and second and third processing elements 28, 29 
and 30, of the hidden layer 14 via paths 31-41, 42-52 and 53-63 so that 
each hidden layer processing element 28, 29 and 30 receives a value or 
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signal from each input preprocessor 17-27. Each path is provided with a 
unique weight based on the results of training on training data. The 
unique weights 64-74, 75-85, and 86-96 (see, also Table below) are non- 
linearly related to the output and are unique for each network structure 
and initial values of the training data. The final value of the weights are 
based on the initialized values assigned for network training. The 
combination of the weights that result from training constitute a 
functional apparatus whose description as expressed in weights produces 
a desired solution, or more specifically a risk assessment for preterm 
delivery before 35 weeks. 

The hidden layer 14 is biased by bias weights 97, 98 and 99 
provided via paths 100, 101, and 102 to the processing elements 28, 29 
and 30- The output layer 1 6 contains two output processing elements 
103, 104. The output layer 16 receives input from the hidden layer 
processing elements 28, 29 and 30 via paths 105-1 10. The output layer 
processing elements 103, 104 are weighted by weights 111-116. The 
output layer 1 6 is biased by bias weights 117, 118 provided via paths 
119 and 120 to the processing elements 103 and 104. 

The preliminary risk of delivery before 35 completed weeks of 
gestation is the output pair of values A and B from the two processing 
elements 103 and 104. The values are always positive between zero and 
one. One of the indicators is indicative of a risk of preterm delivery. The 
other is an indicator of the absence of such risk. While the output pair A, 
B provide generally valid indication of risk, a consensus network of 
trained neural networks provides a higher confidence index. EGA 6 
contains 8 such trained neural networks. 

The following tables set forth the values of the individual weights 
for each of the 8 consensus networks, designed EGA6_0 through 
EGA6 7. 
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EGA 6 0 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/ 
weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


0.412437 


-0.143143 


-1.885393 


-0.9598620 


0.945025 


1 


2.041 149 


-0.021533 


0.162966 


-4.839373 


4.875033 


2 


1.224530 


0.971002 


-0.590964 


-2.524601 


2.524054 


3 


0.575975 


-3.249891 


-2.814656 


2.583483 


-2.561113 


4 


0.784864 


0.600535 


-0.300794 






5 


1 .075542 


0.1601136 


0.549237 






6 


-1 .047227 


0.047396 


0.905172 






7 


-0.966051 


0.163156 


0.630888 








-0.193761 


-0.149381 


0.163185 






9 


-0.680552 


-2.362585 


1.365873 






10 


1.010706 


-3.633732 


-1 .443890 






11 


1.728520 


-0.590057 


0.878588 







EGA 6 1 



Input layer 


hidden layer (nodes) 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


2.675421 


-0.552641 


0.673642 


0.183663 


0.197713 


1 


-1.181347 


0.284937 


0.720041 


-3.170281 


3.095180 


2 


-0.178288 


-1.102137 


0.655263 


3.795940 


-3.747696 


3 


1.048956 


-0.941387 


-1.733601 


-6.612447 


6.498429 


4 


0.033454 


0.927974 


2.987905 






5 


-1.161823 


1.217736 


1.014796 






6 


6.168329 


2.549298 


-1.321217 






7 


-1.560728 


-1.637513 


-1.160331 






8 


1.671384 


3.395848 


-0.117778 






9 


0.416004 


1.452099 


-0.246268 
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Input layer 


hidden layer (nodes) 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


10 


-2.228914 


1.834281 


0.748248 






1 1 


-3.687070 


1.693113 


-0.492244 






EGA 6 2 


Input layer 


hidden layer (nodes) 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


-1.013347 


1.392476 


3.390216 


1.093532 


-1.084186 


1 


-3.020375 


0.554074 


2.172394 


-1.633913 


1.632363 


2 


-0.899928 


1.928149 


0.466793 


-3.099829 


3.091530 


3 


-8.108200 


0.583508 


0.030467 


-2.860816 


2.845121 


4 


3.260629 


9.249855 


0.577971 






5 


-0.567385 


1.008019 


0.196682 






6 


-2.382355 


-2.942121 


0.568323 






7 


-1 .996632 


-2.203792 


-0.852693 






8 


0.217054 


-0.230021 


-0.710703 






9 


0.380832 


-0.276078 


-1.551226 






1 U 


1.933148 


0.603005 


-0.856832 






11 


-1.922944 


-1.396864 


-2.356188 






EGA 6 3 




Input layer 


hidden layer (nodes) 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


1,493395 


-2.294246 


2.173191 


-1.417536 


1.413825 


1 


3.959154 


0.635345 


0.976585 


-2.381441 


2.355649 


2 


0.396474 


-1.310699 


0.708136 


2.652994 


-2.638396 


3 


-0.404996 


-0.906109 


1.164319 


-3.176520 


3.136459 


4 


-0.1 13969 


-0.611193 


-0.896189 






5 


0.665321 


-1.422789 


0.184973 
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InDut laver 


hidden layer (nodes) 


output layer (nodes) 


norfp/weioht 

1 Iv/UC/ WW ^l\^l IL 


1 St 




ora 


1 ST 




5 


1.628547 


2.765793 


0.315556 






7 


-0.673276 


1.645794 


-0.975604 






8 


-2.422190 


1.272992 


0.612878 






9 


-1.494859 


2.990876 


0.002188 






10 


-0.316486 


-0.614556 


-0.993159 






1 1 


-3.208810 


-0.869353 


-3.219709 






EGA 6 4 








hidden layer (nodes) 


output layer (nodes) 


nr>H^/\A/^inKt 


1st 


2nd 


3rd 


1st 


2nd 


0 


1.595199 


-1.400935 


-1.254950 


-1.033706 


1.017989 


1 


1 .597543 


1 .434936 


-1 .886380 


-3.i399452 


3.915186 


2 


0.424391 


-0.524230 


0.974168 


2.759211 


-2.750812 


o 


1 .340851 


0.063071 


-5.226755 


-2.077351 


2.087066 


A 
*+ 


0.145379 


-3.090206 


-1.188423 






\j 


0.569193 


-1.5561 14 


-1.835809 






Q 


0.380544 


3.770102 


-1.193652 






7 


-0.41461 1 


2.391878 


-0.326348 






8 


0.082901 


0.821397 


-2.173482 






9 


-0.893175 


0.099641 


-1.615205 






10 


0.312568 


-0.034908 


-1.900884 






11 


-1.068789 


1 .023022 


-1.393905 






EGA 6 5 


Input layer 


hidden layer (nodes) 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


2.503198 


-2.428604 


-0.130730 


-2.186942 


2.173897 


1 


-2.192063 


-3.125744 


3.638620 


-2.776665 


2.660086 
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iii|juL layd 


hidden layer (nodes) 


output layer (nodes) 


nnrip/wpioht 


1 St 






1 SI 




2 


1 . D / y / 


vj.ooooyD 


1 A~7 Of^AA 
1 .*f / Z 0^ 1 


Z . / O / O 1 H- 


-Z . / 1 OOOD 


3 


-O.Od73oo 


0.422544 


-1.1 yb 1 Ob 


-1 .bobbyb 


1 .b4/ 1 /2 


A 

*T 




-o.bboH-U/ 


-1 .Ul o 1 4b 






R 


1 992165 


-3 716873 


-0 868908 






6 


-4.089348 


2.595805 


3.020147 






7 


-2.734360 


2.001578 


-0.018092 






8 


-1 .668519 


-0.383332 


-3.587072 






9 


-1.886910 


0.268403 


-0.229832 






10 


-1.519840 


-1.147216 


1.671855 






1 1 


-1.200146 


3.289453 


-4.163397 







EGA 6 6 



Input layer 


hidden layer {nodes} 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


-1.443015 


0.865813 


0.382970 


-2.388151 


2.408045 


1 


-1.582839 


0.593947 


0.830775 


4.015757 


-4.056962 


2 


-1.1 19793 


-0.355416 


0.803208 


-2.574057 


2.594654 


3 


2,549989 


0.295836 


0.454763 


-3.381956 


3.430132 


4 


-3.080358 


-3.033361 


1.023391 






5 


-2.302934 


0.508087 


-0.703378 






6 


-0.040867 


-2.352165 


-1 .982702 






7 


1.082370 


3.718414 


-4.853944 






8 


-0.564883 


-4.419714 


-2.375676 






9 


0.953993 


-2.047337 


-0.481060 






10 


-1.062311 


0.216755 


-2.037935 






1 1 


1.488106 


-3.616466 


-0.630520 
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EGA 6 7 



Input layer 


hidden layer (nodes) 


output layer (nodes) 


node/weight 


1st 


2nd 


3rd 


1st 


2nd 


0 


1.622433 


1.633779 


-3.852473 


-0.748768 


0.742163 


1 


0.043906 


-0.351661 


-2.4311 70 


-3.003003 


2.983215 


2 


0.732213 


-0.661362 


-0.746753 


-2.218790 


2.184970 


3 


-2.027060 


1.301339 


-1.768983 


3.052581 


-3.004828 


4 


1.521622 


1.790975 


-0.154270 






5 


1.677837 


-0.625462 


0.730582 






6 


-1.347791 


-4.165056 


-0.685942 






7 


-1 .774773 


5.494371 


1 .034300 






8 


-0.827799 


1.789396 


0.538103 






9 


-0.509971 


-0.183482 


1.543398 






10 


0.605369 


2.345229 


1.277570 






11 


0.691960 


-3.950886 


2.871648 







The EGA6 preprocessing information is the same for each of the 8 
neural networks in the consensus. The Input is preprocessed by 

subtracting the mean value and dividing by the standard deviation. 



Node 


Mean 


Standard Deviation 


1 


0.399738 


0.490166 


2 


0.01 1796 


0.108036 


3 


30.593335 


4.979660 


4 


30.709605 


5.405745 


5 


30.278038 


2.976036 


6 


0.490092 


0.667659 


7 


0.178244 


0.471996 


8 


0.198946 


0.508406 


9 


1.823197 


0.757205 
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Node 


Mean 


Standard Deviation 


10 


0.399738 


0.490166 


1 1 


0.195282 


0.396677 




EGAS is a set of 8 consensus networks trained similarly to EGA6, 

except that the input variables did not include the variable representing 

the result of the fFN ELISA test. This network can be used as a point of 

care application to give immediate result to the clinician rather than the 

24 to 48 hours required to process the fFN sample. 

D. Neural network prediction of risk of delivery within 7 days- 
EGAD7 and EGAD7F 

1. Variable selection 

Using the same database described above for EGA1-EGA6, the 
variable selection protocol was applied to prediction of the risk for 
delivery within 7 days of sampling for the fFN test. As noted above for 



15 



EGAS and EGA6, the variable s 
absence of the fFN test result 



20 



1. 
2. 
3. 
4. 
S. 
6. 



Vaginal bleeding a: 
Uterine contractions per hour; 
No previous pregnimcies. 



election procedure was applied in the 
Application of the variable selection 
procedure to the 48 variables resulted in selection of the following 
variables: 

ucasion ( i.e. , yes or no); 
Uterine contractions with or without pain ( i.e. , yes or no); 
Parity-abortions; 

time of sampling; 
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2. 



Neural nets 



^ Using these variables two consensus networks were trained. One, 

designated EGAD7 was trained without including the results of the fFN 
ELISA test result, and the ether, designated EGAD7f, was trained with 
the results of the fFN ELISA test result. 

Fig. 18, which represents EGA7f, is a schematic diagram of an 
embodiment of the neural network 10 trained on clinical data of the form 
used for the consensus network (Fig. 10) of a plurality of neural 
networks. The structure is stored in digital form, along with weight 
values and data to be processed in a digital computer. This neural 
network 10 contains three layers, an input layer 12, a hidden layer 14 
and an output layer 16. The input layer 12 has seven input preprocessors 
1 7-23, each of which is provided with a normalizer (not shown in the 
figure, see table below) which generates a mean and standard deviation 
value to weight the clinical factors which are input into the input layer. 
The mean and standard deviation values are unique to the network 
training data. The input layer preprocessors 17-23 are each coupled to 
first, second, third, fourth and fifth processing elements 24-28, 
respectively, of the hidden layer 14 via paths 29-35, 36-42, 43-49, 50- 
56, and 57-63 so that each hidden layer processing element 24-28, 
-receives a value or signal from each input preprocessor 17-23. Each path 
is provided with a unique weight based on the results of training on 
training data. The unique weights 64-70, 71-77, 78-84, 85-91 and 92- 
98 (see, also Table below) are non-linearly related to the output and are 
unique for each network structure and initial values of the training data. 
The final value of the weights are based on the initialized values assigned 
for network training. The combination of the weights that result from 
training constitute a functional apparatus whose description as expressed 
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in weights produces a desired solution, or more specifically a risk 
assessment of delivery within 7 days of sampling for the f FN ELISA test. 

The hidden layer 14 is biased by bias weights 99, 100, 101, 102 
and 103 provided via paths 104, 105, 106, 107 and 108 to the 
5 processing elements 24, 25, 26, 27 and 28. The output layer 16 

contains two output processing elements 109, 110. The output layer 16 
receives input from the hidden layer processing elements 24-28 via paths 
1 1 1-120. The output layer processing elements 109, 110 are weighted 
by weights 121-130. The output layer 16 is biased by bias weights 131, 
10 132 provided via paths 133 and 134 to the processing elements 109 and 
110. 

The preliminary risk of delivery within 7 days from sampling for the 
fFN ELISA test is the output pair of values A and B from the two 
processing elements 109 and 110. The values are always positive 

15 between zero and one. One of the indicators is indicative of a risk of 
delivery within 7 days. The other is an indicator of the absence of such 
risk. While the output pair A, B provide generally valid indication of risk, 
a consensus network of trained neural networks provides a higher 
confidence index. EGAD7f contains 8 such trained neural networks. 

20 The following tables set forth the values of the Individual weights 

for each of the 8 consensus networks, designated EGAD7fO through 

EGAD7f7: 
EGAD7fO 



Input 
layer 


hidden layer (nodes) 




output layer (nodes) 


node/ 
weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


-0.204716 


1.533574 


1 .452831 


0.129981 


-1 .784807 


0.854229 


-0.883808 


1 


-1.843673 


1.957059 


-2.668371 


-0.551016 


1.505628 


-5.294533 


5.303048 


2 


-1.324609 


0.258418 


-1.280479 


-0.476101 


0.827188 


-7.468771 


7.514580 


3 


-1.281561 


1.697443 


6.865219 


4.212538 


-1.953753 


-5.082050 


5.003566 
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Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/ 
weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


4 


-1.159086 


-0.345244 


-4.689749 


-0.406485 


1.027280 


4.014138 


-4.006929 


5 


-2.042978 


0.182091 


2.612433 


2.399196 


-1.397453 


-4.105859 


4.105161 


6 


-4.076656 


1.416529 


0.979842 


-2.589272 


0.068466 






7 


-0.499705 


-1.383732 


-2.411544 


0.173131 


-1.919889 







EGAD7f1 



Input 
layer 


hidden layer (nodes) 




output lay 


er (nodes) 


node 
weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


1.522090 


6.396365 


1.750606 


0.650769 


0.673423 


0.282480 


-0.222861 


1 


1.930314 


0.027271 


0.386927 


1.602559 


3.495371 


-5.126995 


4.888618 


2 


1.578675 


-0.445222 


0.352425 


1 .305894 


1.703156 


-3.751147 


3.752025 


3 


1.821893 


6.258497 


1.140159 


1 .363783 


-0.717021 


-5.496184 


5.687717 


4 


-4.599618 


0.218248 


0.385593 


0.945824 


0.644622 


7.713794 


-8.054935 


5 


-2.755846 


-1.799000 


2.162089 


1.730335 


-0.388646 


-3.429169 


3.706028 


6 


0.524701 


1.669467 


1.741620 


3.956515 


4.717868 






7 


-2.089663 


-0.190423 


-1.736970 


0.085315 


-1.010295 







EGAD7f2 



Input 
layer 


hidden layer (noc 


Jes) 


output lay 


er (nodes) 


node 

weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


0.554749 


4.029042 


1.041783. 


0.687361 


2.078268 


0.718456 


-0.756554 


1 


0.314365 


-1.614025 


4.560114 


-0.197290 


2.352322 


3.339842 


-3.185465 


2 


-1.992577 


-1.810437 


2.067243 


•0.021868 


0.041441 


-5.596330 


5.470991 


3 


-4.762585 


-6.021220 


3.627642 


3.505088 


1.221308 


0.815486 


-0.906961 


4 


8.422636 


-1 .088322 


-1.229308 


-2.513499 


0.344056 


-4.076351 


4.165072 


5 


-0.547021 


-6.256763 


1.108255 


1.341978 


-0.074222 


-7.385492 


7.372295 
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Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node 
weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


6 


0.581056 


-2.916328 


0.639607 


0.894802 


2.365492 






7 


1.260577 


-1.583044 


0.882731 


-1.113407 


-1.657523 







EGAD7f3 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


1.258939 


0.778115 


1.117508 


-5.828234 


3.275221 


-0.174440 


0.260818 


1 


1 .038074 


0.395096 


-1.080656 


-0.580291 


-1.077984 


-6.546609 


6.515666 


2 


-2.174144 


0.453939 


-0.677622 


-1.330231 


-0.383479 


-8.061748 


8.067432 


3 


0.608410 


2.262108 


9.263388 


4.024162 


0.949009 


4.938700 


-5.060233 


4 


1 .443697 


-1.530076 


-0.812837 


1.549062 


-1.594324 


5.420476 


-5.517191 


5 


-1.437676 


0.749049 


5.493512 


-2.797146 


-2.056666 


-5.085781 


5.127757 


6 


0.778191 


1.397835 


-3.635368 


2.191902 


-2.403500 






7 


-1 .776540 


-0.675587 


0.115710 


0.388203 


-1.363938 







EGAD7f4 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


-1.839879 


0.255905 


3.002103 


0.886848 


-0.485949 


-1.461668 


1.340040 


1 


-1.335228 


-3.428058 


0.665937 


-1.072765 


-0.372897 


-1.862627 


1.815599 


2 


0.062547 


0.489211 


0.946443 


-3.642373 


3.973801 


5.835287 


-5.699555 


3 


1 .888678 


1.928167 


4.900952 


1.928106 


-1.866227 


-5.463729 


5.463984 


4 


-5.217631 


-1.441138 


-4.114171 


0.629958 


-1.615146 


-5.726771 


5.763464 


5 


-0.631546 


1 .735842 


1.158419 


0.638580 


-3.276926 


-7.193156 


7.177080 


6 


-3.109977 


-0.377960 


1 .372646 


2.625961 


-1 .700064 






7 


-0.070132 


1 .763962 


-2.234798 


-1.165563 


-1 .845262 
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EGAD7f5 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eiQht 


1 St 


2nd 


3rd 


4th 


5th 


1 st 


2nd 


0 


-1.456277 


1.321048 


1.214385 


0.069355 


-0.206125 


-1.581118 


1.811097 


1 


1.988970 


-2.788917 


1.700144 


-3.790842 


0.760984 


-3.282460 


2.842431 


2 


-0.889522 


-1.748239 


0.798888 


-0.481237 


0.248333 


-6.391959 


6.435954 


3 


15.258006 


0.809204 


4.071811 


-3.751193 


-6.873492 


-6.817300 


6.829902 


4 


-18.202002 


-2.000871 


0.021785 


0.812317 


0.713510 


6.157183 


-6.412641 


5 


0.440615 


-0.470067 


-1.578267 


-0.216803 


-3.315356 


-7.015062 


6.902892 


6 


-1.931575 


0.510900 


1.162408 


-2.528233 


1.405955 






7 


-3.758462 


-0.570789 


-6.338710 


0.877703 


-0.985724 







EGAD7f6 



Input 
layer 




hid( 


jen layer (nodes) 


output lay 


er (nodes) 


node/ 
weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


1.512437 


-0.333348 


-0.557454 


-0.790704 


0.049061 


-0.918761 


0.804829 


1 


-0.704182 


-0.032274 


-3.201322 


-0.966885 


-0.213225 


-2.987857 


2.999401 


2 


0.443652 


-0.736894 


-0.713164 


-0.709163 


-0.725865 


-5.682138 


5.675150 


3 


2.734173 


0.555570 


-2.071605 


7.636067 


-7.109310 


4.989255 


-4.851893 


4 


-4.066469 


-0.039688 


0.313027 


-0.265136 


0.152398 


-4.107172 


4.101486 


5 


0.943337 


-0.658673 


-0.079748 


3.091015 


-5.459067 


-5.247225 


5.231175 


6 


-0.211375 


0.247671 


-2.400778 


2.663087 


-1.717437 






7 


-1.291067 


-4.507938 


1.526173 


-0.139780 


-0.451653 







EGAD7f7 



Input 
layer 


hidden layer (nodes) 


output lay( 


3r (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


0.580523 


0.319374 


-0.660897 


1.072931 


-0.522045 


-0.833235 


1.016355 


1 


0.432923 


3.916608 


0.386343 


-1.324510 


-1.566712 


-4.472839 


4.433871 
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Input 
layer 


hidden layer (nodes) 




ouipui layc 




node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


2 


-0.312324 


3.099275 


0.344633 


-3.254393 


-1 .081 1 14 


-4.873536 


4.91 9722 


3 


4.019378 


-5.440501 


-9.105190 


1.955846 


-2.152612 


4.971172 


-5.215318 


4 


-0.355344 


0.495595 


0.543102 


-2.001959 


-0.989721 


-3.436097 


3.478752 


5 


-1.585942 


-3.885213 


-2.778485 


1 .068593 


-1 .697807 


-4.098137 


4.165162 


6 


-0.209687 


-0,646458 


-2.399903 


0.177487 


2.339257 






7 


-8.951553 


-1.471208 


0.725651 


-2.732204 


1 .538870 







The EGAD7F preprocessing information is the same for each of the 
8 neural networks in the consensus. The input in preprocessed by 
subtracting the mean value and dividing by the standard deviation. 



Node 


Mean 


Standard Deviation 


1 


0.399738 


0.490166 


2 


0.517693 


0.500015 


3 


0.621232 


1 .030720 


4 


0.198946 


0.508406 


5 


2.144928 


2.291734 


6 


0.281782 


0.450163 


7 


0.195282 


0.396677 




EGAD7 is a set of 8 consensus networks trained similarly to 
EGAD7f, except that the input variables did not include the variable 
representing the result of the fFN ELISA test. This network can be used 



as a point of care application to 



give immediate result to the clinician 



rather than the 24 to 48 hours required to process the fFN sample. 
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E. Neural network prediction of risk of delivery within 14 days- 
EGAD14f and EGAD14 

1 . Variable selection 

Using the same database described above for EGA1-EGAD7, the 

5 variable selection protocol was applied to prediction of the risk for 

delivery within 14 days of sampling for the fFN test. As noted above for 

EGAS, EGA6 and EGAD7, the variable selection procedure was applied in 

the absence of the fFN test result. Application of the variable selection 

procedure to the 48 variables resulted in selection of the following 

10 variables: 

1. Ethnic Origin 4: Hispanic ( i.e. . yes or no); 

2. Marital Status 5: living with partner; 

3. Uterine contractions with or without pain ( i.e. . yes or no); 

4. Cervical dilatation; 

15 5. Uterine contractions per hour; 

6. No previous pregnancies. 

2. Neural nets 
Using these variables two consensus networks were trained. One, 
designated EGAD14 was trained without including the results of the fFN 
20 ELISA test result, and the other, designated EGAD14f, was trained with 
the results of the fFN ELISA test result. 

Fig. 18, which represents EGAD14f (as well as EGAD7f), is a 
schematic diagram of an embodiment of the neural network 10 trained on 
clinical data of the form used for the consensus network (Fig. 10) of a 
25 plurality of neural networks. The structure is stored in digital form, along 
with weight values and data to be processed in a digital computer. This 
neural network 10 contains three layers, an input layer 12, a hidden layer 
14 and an output layer 16. The input layer 12 has seven input 
preprocessors 1 7-23, each of which is provided with a normalizer (not 
30 shown in the figure, see Table, below) which generates a mean and 
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Standard deviation value to weight the clinical factors which are input into 
the input layer. The mean and standard deviation values are unique to 
the network training data. The input layer preprocessors 17-23 are each 
coupled to first, second, third, fourth and fifth processing elements 24- 
5 28, respectively, of the hidden layer 14 via paths 29-35, 36-42, 43-49, 
50-56, and 57-63 so that each hidden layer processing element 24-28, 
receives a value or signal from each input preprocessor 17-23. Each path 
is provided with a unique weight based on the results of training on 
training data. The unique weights 64-70, 71-77, 78-84, 85-91 and 92- 

□ 10 98 (see, also Table below) are non-linearly related to the output and are 

„e 

ly unique for each network structure and initial values of the training data. 



14 

m 



The final value of the weights are based on the initialized values assigned 
IP for network training. The combination of the weights that result from 

i;^ training constitute a functional apparatus whose description as expressed 

15 in weights produces a desired solution, or more specifically a risk 
M assessment of delivery within 14 days of sampling for the fFN ELISA test, 

i'lj The hidden layer 14 is biased by bias weights 99, 100, 101, 102 

and 103 provided via paths 104, 105, 106, 107 and 108 to the 
processing elements 24, 25, 26, 27 and 28. The output layer 16 
20 contains two output processing elements 109, 110. The output layer 16 
receives input from the hidden layer processing elements 24-28 via paths 
1 1 1-120. The output layer processing elements 109, 110 are weighted 
by weights 121-130. The output layer 16 is biased by bias weights 131, 
132 provided via paths 133 and 134 to the processing elements 109 and 
25 110. 

The preliminary risk of delivery within 14 days from sampling for 
the fFN ELISA test is the output pair of values A and B from the two 
processing elements 109 and 1 10. The values are always positive 
between zero and one. One of the indicators is indicative of a risk of 



-148- 




24727-801F 



delivery within 14 days. The other is an indicator of the absence of such 
risk. While the output pair A, B provide generally valid indication of risk, 
a consensus network of trained neural networks provides a higher 
confidence index. EGAD14f contains 8 such trained neural networks. 
5 The following tables set forth the values of the individual weights 

for each of the 8 consensus networks, designed EGAD14fO through 
EGAD14f7. 



EGAD14fO 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


-0.191126 


1.174059 


0.810632 


0.148573 


-2.437188 


0.106355 


-0.108766 


1 


-2.921661 


-0.713076 


1,312931 


10.427816 


1.824513 


-2.220130 


2.198498 


2 


-0.848702 


1.614504 


2.640692 


-0.445807 


1.218097 


-2.016395 


2.005455 


3 


-1.008667 


0.138305 


1.372127 


0.788516 


-3.114650 


-4.365818 


4.349520 


4 


-1.422990 


-1.517308 


-1.632533 


-3.146550 


0.256047 


2.291882 


-2.293527 


5 


-2.588523 


-0.733381 


0.992748 


1.482687 


1.197727 


-4.864353 


4.861522 


6 


-3.611756 


-2.669159 


3.364100 


-1.806442 


0.833890 






7 


-0.516151 


-2.104245 


-2.052761 


-0.615030 


-1.621589 







EGAD14f1 



Input 
layer 




hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


0.396502 


2.426709 


0.752911 


1.549394 


-0.064008 


-0.285667 


0.714618 


1 


1.248711 


2.179334 


-0.016570 


-0.040113 


2.457661 


-3.745954 


3.884410 


2 


1.912210 


0.937177 


-1.742286 


-2.094312 


-1.165847 


-4.912591 


4.966647 


3 


-1.018760 


-1.087528 


-0.344108 


0.384237 


-1.077692 


-7.433263 


7.309962 


4 


1.090578 


-2.229295 


-0.890326 


-1.334206 


0.822185 


2.080292 


-2.595363 


5 


1.399831 


-5.077936 


-0.600345 


4.128439 


-1.715393 


5.481619 


-5.611861 


6 


2.241531 


-4.673233 


-0.209741 


2.954158 


-4.565109 







-149- 



24727-801 F 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


7 


0.077090 


-0.194145 


-4.391311 


3.250038 


-2.360049 







EGAD14f2 



Input 
layer 


hid( 


Jen layer (nod 


es) 


output lay( 


ir (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


0.286926 


1.855804 


0.103985 


-2.590399 


2.265841 


1.540065 


-1.592696 


1 


1.928731 


0.410516 


-2.015740 


1.017801 


2.088775 


2.433105 


-2.545955 


2 


-0.666312 


-1.178337 


1.227737 


-1.471309 


1.922938 


-4.736276 


4.903823 


3 


-2.716156 


-2.328632 


-0.566546 


0.854688 


-0.448565 


-2.220462 


2.268171 


4 


0.654814 


-0.197945 


-2.256156 


-0.410249 


-0.792705 


-4.049918 


4.142265 


5 


-2.004537 


-3.451720 


3.311102 


1.787226 


-0.682330 


-3.930044 


4.036821 


6 


-0.947058 


-1.898302 


-0.131517 


4.187262 


2.272720 






7 


0.485620 


-0.138471 


1.038285 


-1.245135 


-6.442445 







EGAD14f3 



Input 

layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


1.199346 


1.135219 


2.839737 


-4.673778 


2.903983 


-0.702760 


0.935822 


1 


-1.274101 


1.559637 


1 .386395 


-0.042351 


-0.874145 


-3.244763 


3.144603 


2 


-0.353335 


0.325171 


-1.677620 


-0.793429 


0.788584 


-4.933673 


4.849451 


3 


-0.678281 


-2.157454 


-3.084480 


1.009661 


0.327746 


3.306738 


-3.432135 


4 


1,116566 


0.128203 


-2.188180 


2.315793 


-1.815446 


4.993960 


-5.098751 


5 


-1 .277371 


-0.415757 


-0.080374 


-0.694424 


-1 .022831 


-4.266839 


4.064770 


6 


-4.836841 


3.738553 


-0.703345 


0.271620 


-0.626113 






7 


-0.953257 


-0.463343 


1.314770 


-0.196871 


-2.372877 
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EGAD14f4 



Input 
layer 


hidden layer (nodes) 




output layer (nodes) 


node/ 
weight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


-1.810913 


-0.014885 


0.167362 


-2.605120 


-0 205378 


-0.681096 


0.709641 


1 


5.080080 


1.259709 


0.430446 


0.680130 


-3.098744 


-3.611765 


3.644697 


2 


-0.414857 


-0.328851 


-0.335724 


5.756228 


1 .904646 


4.377642 


-4.41 9249 


3 


0.525909 


1.767786 


-0.375093 


1.041263 


-0.56661 1 


-6.720907 


6.647904 


4 


-7.166096 


-0.912267 


-1.948366 


-1.1 17219 


-1.237101 


-2.355787 


2.337121 


5 


-4.340267 


-0.345630 


-0.077869 


3.853568 


-2.550077 


-2.249878 


2.171079 


6 


-2.586306 


-3.315458 


0,378838 


5.812339 


-3.619375 






7 


0.213139 


-1.546969 


-10.991954 


-1.186517 


-0.502957 







EGAD14f5 



Input 
layer 




hld( 


den layer (nod 


es) 


output lay 


Br (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


-2.439228 


0.954525 


1.242215 


-27.696498 


0.322283 


-2.017057 


2.095211 


1 


1.998281 


1.928331 


0.638520 


-1,415280 


1.871968 


6.487561 


-6.308325 


2 


-0.869648 


-0.994059 


0.768856 


0.368344 


1.457719 


-4.867902 


4.744858 


3 


0.295868 


-0.257773 


1.422994 


0.033843 


-4.658167 


-2.392888 


2.192236 


4 


-1.800394 


-2.612705 


-1.668799 


51.649234 


-0.537556 


1.222661 


-1.270161 


5 


0.992302 


-0.938952 


1.104910 


3.731820 


1.651959 


-1.649461 


1.594009 


6 


-1.787379 


-1 .045545 


2.711432 


0.288323 


-0.572490 






7 


-0.374909 


-0.877122 


-1.918442 


214.812434 


-1.773228 







EGAD14f6 



Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


3.984308 


-0.300188 


6.132831 


1.776838 


1.182643 


c -0.1 41 300 


-0.062816 


1 


-2.478863 


0.891740 


-0.185527 


-0.442487 


1.045499 


-5.041497 


4.985260 
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Input 
layer 


hidden layer (nodes) 


output layer (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


2 


0.389668 


0.650328 


-289.318971 


0.651142 


0.169117 


-7.230831 


7.280185 


3 


0.370846 


0.503667 


21.787679 


1.820010 


-0.802930 


2.464335 


-2.250474 


4 


-0.950033 


-0.054657 


0.942573 


-1.024688 


-1.842654 


2.637713 


-2.636534 


5 


3.200645 


0.464231 


0.728644 


1.784671 


-5.371345 


-3.675622 


3.704625 


6 


0.647747 


2.560388 


-0.798268 


3.237414 


-4,493387 






7 


-1.276096 


-1.593493 


66.059880 


0.493228 


-0.126844 







EGAD14f7 



Input 
layer 


hidden layer (node 


(S) 




output lay< 


3r (nodes) 


node/w 
eight 


1st 


2nd 


3rd 


4th 


5th 


1st 


2nd 


0 


0.888004 


0.521346 


-0.513845 


0.767983 


-0.956920 


-1.088033 


1 .264836 


1 


0.191409 


1.634987 


-0.771837 


-2.402982 


-1.003714 


-4.407106 


4.589468 


2 


2.233326 


0.767802 


-10.205298 


0.362276 


0.797006 


-4.385751 


4.466996 


3 


-0.588252 


-5.586697 


0.233547 


0.586147 


1.589040 


5.286517 


-5.562157 


4 


-1.544910 


-0.829764 


0.624734 


-5.119879 


-0.276545 


-0.907527 


0.809701 


5 


-0.361805 


0.397313 


-1.973167 


-2.953926 


-0.614287 


-5.146765 


5.284392 


6 


-0.136039 


-1.488352 


-3.541771 


3.717852 


-1.091340 






7 


-8.058644 


-1.997797 


1.520159 


-0.638158 


1.013775 







10 The EGAD14F preprocessing information is the same for each of 

the 8 neural networlcs in the consensus. The input is preprocessed by 



subtracting the mean value and dividing by the standard deviation. 
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Node 


Mean 


Standard Deviation 


1 


0.152031 


0.359287 


2 


0.91796 


0.108036 


3 


0.517693 


0.500015 


4 


0.490092 


0.667659 


5 


2.144928 


2.291734 


6 


0.281782 


0.450163 


7 


0.195282 


0.396677 



EG AD 14 is a set of 8 consensus networks trained similarly to 
EGAD14f, except that the input variables did not include the variable 
representing the result of the fFN ELISA test. This network can be used 
as a point of care application to give immediate result to the clinician 
rather than the 24 to 48 hours required to process the fFN sample. 

EXAMPjLE 14 

Training of Consensus Neural Networks on Specific subsets of Pat07 
Variables 

The examples shows the results of a task designed to quantitate 
the contribution of pat07 variables tolpatO? performance, and to develop 
endometriosis networks using minimafi numbers of pat07 variables. 
Tasks: / 

1 . Train final consensus networks using the following combination of 
Pat07 variables: / 

a. All 14 minus HxiEndo (13 variables total) 

b. All 14 minus pelvic pain (13 variables total) 

c. All 14 minus dysmenorrhea (13 variables total) 

d. All 14 minus pelvic surgery (13 variables total) 

2. Train final consensus networks using other combinations of Pat07 
variables. \ 
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a. Hx Endo, pelvic parn, and dysmenorrhea 

b. Hx Endo, p 3lvic pain, dysmenorrhea and Hx pelvic 
surgery 

3. Train final consensus networks using other combinations of pat07 

variables as indicated fro^m above results. 
Methodology Used 

Using the original patient database, training examples were 
generated for each of the combinations of variables to be evaluated. 
These training examples contained only the variables required for the 
given consensus run. TrainDos" was used in batch mode to train a set of 
eight neural networks for each of the combinations of variables to be 
evaluated. The networks were trained using the same parameters as the 
Pat07 training runs. The only difference was the setting of the random 
number seeds for each network. Each network was trained on the full 
510 record database. 

From these training runs, a consensus of the outputs was 
generated in an Excel spreadsheet so that the performance of each of the 
networks could be evaluated. 
Results 

Since these runs were final training runs, the effects of eliminating 
variables could be seen but did not give as clear an indication as can be 
achieved by the holdout method. 

Conclusions 

The results of the variable selection runs on the full training 
example for the purposes of determining the contribution of a given set of 
variables is not as good a method as the evaluation method used in the 
variable selection process. The "holdout" method for evaluation with a 
partition of 5 and 20 net consensus gives a substantially better statistic 
for comparison of variables. 
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EXAMPLE 15 

METHOD AND APPARATUS FOR AIDING IN THE DIAGNOSIS OF 
ENDOMETRIOSIS USING A PLURALITY OF PARAMETERS SUITED FOR 
ANALYSIS THROUGH A NEURAL NETWORK (PAT07) 

5 Fig. 7 is a schematic diagram of an embodiment of one type of 

neural network 10 trained on clinical data of the form used for the 

consensus network (Fig. 10) of a plurality of neural networks. The 

structure is stored in digital form, along with weight values and data to be 

processed in a digital computer. This first type neural network 10 

U 10 contains three layers, an input layer 12, a hidden layer 14 and an output 

Q layer 16. The input layer 12 has fourteen input preprocessors 17-30, 

lip 

i,y each of which is provided with a normalizer (not shown) which generates 

111 a mean and standard deviation value to weight the clinical factors which 

En 



are input into the input layer. The mean and standard deviation values 



□ 15 are unique to the network training data. The input layer preprocessors 
1,^ 17-30 are each coupled to first and second processing elements 48, 50 



of the hidden layer 14 via paths 51-64 and 65-78 so that each hidden 
layer processing element 48, 50 receives a value or signal from each 
input preprocessor 1 7-30. Each path is provided with a unique weight 

20 based on the results of training on. training data. The unique weights 80- 
93 and 95-108 are non-linearly related to the output and are unique for 
each network structure and initial values of the training data. The final 
value of the weights are based on the initialized values assigned for 
network training. The combination of the weights that result from 

25 training comprise a functional apparatus whose description as expressed 
in weights produces a desired solution, or more specifically a preliminary 
indicator of a diagnosis for endometriosis. 

For the endometriosis test provided herein, the factors used to train 
the neural network and upon which the output is based are the past 

30 history of the disease, number of births, dysmenorrhea, age, pelvic pain. 
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history of pelvic surgery, smoking quantity per day, nnedication history, 
number of pregnancies, number of abortions, abnormal PAP/dysplasia, 
pregnancy hypertension, genital warts and diabetes. These fourteen 
factors have been determined to be a set of the most influential (greatest 
sensitivity) from the original set of over forty clinical factors. (Other sets 
of influential factors have been derived, see, EXAMPLES, above). 

The hidden layer 14 is biased by bias weights 94, 119 provided via 
paths 164 and 179 to the processing elements 48 and 50. The output 
layer 16 contains two output proceissing elements 120, 122. The output 
layer 1 6 receives input from both hidden layer processing elements 48, 
50 via paths 123, 124 and 125, 126. The output layer processing 
elements 120, 122 are weighted by weights 110, 112 and 114, 116. 
The output layer 1 6 is biased by bias weights 1 28, 1 30 provided via 
paths 129 and 131 to the processing elements 120 and 122. 

The preliminary indication of the presence, absence or severity of 
endometriosis is the output pair of values A and B from the two 
processing elements 120, 122. The values are always positive between 
zero and one. One of the indicators is indicative that endometriosis is 
present. The other one of the indicators is indicative that endometriosis is 
absent. While the output pair A, B provide generally valid indication of 
the disease, a consensus network of trained neural networks provides a 
higher confidence index. 

Referring to Fig. 10, a final indicator pair C, D is based on an 
analysis of a consensus of preliminary indicator pairs from a plurality, 
specifically eight, trained neural networks 10A - 10H (Fig. 10). Each 
preliminary indicator pair A, B is provided to one of two consensus 
processors 150, 152. via paths 133 140 and 141-148. The first 
consensus processor 150 processes all positive indicators. The second 
consensus processor 152 processes all negative indicators. Each 
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consensus processor 150, 152 is an averager, i.e., it merely forms a 
linear combination, such as an iverage, of the collection of like 
preliminary indicator pairs A, B. The resultant confidence indicator pair is 
the desired result, where the in 3uts are the set of clinical factors for the 
patient under test. 

Fig. 9 illustrates a typical processor element 120. Similar 
processors 48 and 50 have more input elements, and processor element 
122 is substantially identical. Typical processor element 120 comprises a 
plurality of weight multipliers 110, 114, 128 on respective input paths 
(numbering in total herein 15, 1 6 or 3 per element and shown herein as 
part of the processor element 120). The weighted values from the weight 
multipliers are coupled to a summer 156. The summer 156 output is 
coupled to an activation function 1 58, such as a sigmoid transfer function 
or an arctangent transfer function. The processor elements can be 
implemented as dedicated hardware or in a software function. 

A sensitivity analysis can be performed to determine the relative 
importance of the clinical factors. The sensitivity analysis is performed on 
a digital computer as follows: A trained neural network is run in the 
forward mode (no training) for each training example (input data group for 
which true output is known or suspected). The output of the network for 
each training example is then recorded. Thereafter, the network is rerun 
with each input variable being replaced by the average value of that input 
variable over the entire training example. The difference in values for 
each output is then squared and summed (accumulated) to obtain 
individual sums. 

This sensitivity analysis process is performed for each training 
example. Each of the resultant sums is then normalized according to 
conventional processes so that if all variables contributed equally to the 
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single resultant output, the normalized value would be 1 .0. From this 

information, the normalized value can be ranked in order of importance. 
In analysis of clinical data, it was determined that the order of 

sensitivity of factors for this neural network system are the past history 
5 of the disease, number of births, dysmenorrhea, age, pelvic pain, history 

of pelvic surgery, smoking quantity per day, medication history, number 

of pregnancies, number of abortions, abnormal PAP/dysplasia, pregnancy 

hypertension, genital warts and diabetes. 
M A specific neural network system has been trained and has been 

p 10 found to be an effective diagnostic tool. The neural network system, as 

illustrated by Figs. 7 and 10, is described as follows: 
f,M Weights, in order of identification and not in order of sensitivity: 

rfl 0. Bias 

h ^' 

M 15 2. Diabetes 

3. Pregnancy hypertension 

4. Smoking Packs/Day 

5. #Pregnancies 

6. #Births 

20 7. #Abortions 

8. Genital Warts 

9. Abnormal PAP/Dysplasia 

10. History of Endometriosis 

1 1 . History of Pelvic Surgery 
25 12. Medication History 

13. Pelvic Pain 

14. Dysmenorrhea 

are as follows for each of eight of the first type of neural networks 10: 
First neural network A: 

30 



-158- 



Node\weight 


1 St Hidden 

Layer 


2nd Hidden 
Layer 


1st Output 
Layer 


2nd Output 

Layer 


U 




-7-7 
0.77 


-O.I 2 


0.1 2 




-1.19 


2. ZD 


-0.44 


0.44 


2 


-O./D 


-2.30 


O.DD 


-O.DD 


*3 


O.01 


-1 .48 






4 


1 .o1 


-O.oo 






b 


1 O "7 

1 .o / 


U.27 






b 


O.OD 


-1 .70 






/ 


-O.4o 


>l **7 






o 
o 


1 o o 
1 .oo 


U.o4 






g 


-1 .96 


-6.1 9 






10 


-4.45 


0.50 






1 1 


1.36 


-0.95 






12 


-1.61 


0.40 






13 


-1.97 


2.38 






14 


-0.91 


1.86 







First Neural Network B: 



Node/Weight 


1 St Hidden 
Layer 


2nd Hidden 
Layer 


1 St Output 
Layer 


2nd Output 
Layer 


0 


-0.16 


-1.62 


0.70 


-0.70 


1 


-3.30 


0.79 


-0.69 


0.69 


2 


0.85 


0.45 


-0.65 


0.65 


3 


1.00 


2.14 






4 


1.00 


3.82 






5 


-0.81 


3.93 






6 


1.57 


3.96 






7 


-1.40 


2.27 
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o 
o 




yj ' 






g 


1.16 


1 .51 






10 


-0.80 


-4.76 






1 1 


-0.01 


2.83 






12 


-1.19 


0.74 






13 


-1.10 


-0.43 






14 


-2.29 


-0.17 







First Neural Network C 



Node/weight/ 


1st Hidden 
Layer 


2nd Hidden 
Layer 


1 St Output 
Layer 


2nd Output 
Layer 








n in 


-0 10 


1 


1.43 


3.31 


-0.90 


0.90 


2 


0.30 


-1 .48 


0.87 


-0.87 


3 


1.17 


-0.83 






4 


2.1 1 


0.60 






5 


-1.16 


-2.09 






6 


1.033 


-1.39 






7 


-0.68 


-0.40 






8 


-0.88 


-0.19 






9 


0.31 


-0.89 






10 


-1.74 


1.36 






1 1 


1.62 


0.59 






12 


-1.49 


-1 .1 1 






13 


-1.05 


0.26 






14 


-0.41 


1.036 
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First Neural Network D 



NodeVweight 


1 St Hiaden 
Layer 


Zna Hidden 
Layer 


1 St Uutput 
Layer 


znci Uutput 
Layer 


0 


1.08 


-0.03 


-1.43 


1.30 


1 


1 .27 


-0.58 


1.39 


1.28 


2 


-0.89 


-0.46 


1.28 


1.17 


3 


-1.00 


-0.94 






4 


-1.74 


0.73 






5 


-0.40 


0.10 






6 


-1.38 


0.55 






7 


1 .26 


-0.79 






8 


1 .06 


-0.10 






9 


0.66 


-1.36 






10 


0.71 


1 .01 






11 


-0.57 


0.00 






12 


0.67 


-0.38 






13 


1.89 


-0.49 






14 


-0.90 


1.57 







First Neural Network E 



Node/Weight 


1 St Hidden 
Layer 


2nd Hidden 
Layer 


1 St Output 
Layer 


2nd Output 
Layer 


0 


0.14 


-3.93 


0.46 


-0.46 


1 


-2.12 


-1.07 


-0.52 


0.51 


2 


8.36 


1.16 


-0.80 


0.82 


3 


1.02 


1.39 






4 


1.79 


1.01 






5 


0.31 


-1.08 
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Node/Weight 


1 St Hidden 

1 A\/pr 
luci yd 


2nd Hidden 

1 aver 
Lad yd 


1 St Output 
LsyGr 


2nd Output 


\j 


2 87 


2 33 






7 




O 7fi 






Q 
O 




V-/ . yJ 1 






9 


-1.75 


-0.31 






10 


-2.98 


-1.92 






1 1 


1.72 


0.59 






12 


-1.22 


0.06 






13 


-2.47 


-0.76 






14 


-1.14 


-1 .44 







First Neural Network F 



NodeXweight 


1 St Hidden 
Layer 


2nd Hidden 
Layer 


1 St Output 
Layer 


2nd Output 
Layer 


0 


-1.19 


0.82 


0.68 


-0.68 


1 


-2.93 


0.19 


-0.67 


0.67 


2 


1 .19 


0.72 


-0.58 


0.59 


3 


6.85 


0.83 






4 


1.08 


0.59 






5 


0.66 


0.07 






6 


1.65 


1.06 






7 


-0.28 


0.51 






8 


-1.63 


1.04 






9 


-1.15 


1.47 






10 


-0.80 


-1 .97 






1 1 


0.43 


0.97 






12 


-0.13 


-0.91 






13 


-3.10 


0.15 
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Node\weight 


1st Hidden 

Layer 


2nd Hidden 

Layer 


1 St Output 

Layer 


2nd Output 
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First Neural Network G 
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\«&: Normalized Observation Value For First Type Neural Networks 
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0.39 


0.49 


0.19 


0.39 


0.72 


0.45 



Further, as provided herein, the results of biochemical tests, such 
as tests according to the ELISA format test, may be used to produce 
trained augmented neural network systems to produce a relatively higher 
confidence level in terms of sensitivity and specificity. These second 
type neural networks are illustrated in Fig. 8. The numbering is identical 
to Fig. 7, except for the addition of a node 31 in the input layer 12 and a 
pair of weights 109 and 111. All weights, however, in the network 
change upon training with the additional biochemical result. The exact 
weight set is dependent on the specific biochemical test training example. 

The training system provided herein may be used. Alternative 
training techniques may also be used (see, e.g. . Baxt, "Use of an Artificial 
Neural Network for the Diagnosis of Myocardial Infarction," Annals of 
Internal Medicine 115 , p. 843 {1 December 1991); "Improving the 
Accuracy of an Artificial Neural Network Using Multiple Differently 
Trained Networks," Neural Computation 4 . p. 772 (1992)}. 

In evaluating the test results, it was noted that a high score 
correlated with presence of disease, a low score correlated with absence 
of the disease, and extreme scores raised confidence, while midrange 
scores reduced confidence. The presence of endometriosis was indicated 
by an output of 0.6 or above, and its absence by 0.4 or less. It was also 
noted that higher relative scores correlated with higher relative severity of 
the disease. The methods herein, minimize the number of patients that 
require further, often surgical, procedures to establish the presence, 
absence or severity of the disease condition. 
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Since modifications will be apparent to those of skill in this art, it is 
intended that this invention be linnited only by the scope of the appended 
clainns. 
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